20 October, 2010

How the Optimizer can use Constraint Definitions

A simple demonstration of how properly defined Referential Integrity constraints can be used by the Optimizer.....

If two tables (say SALES and SALES_LINES) have a Parent-Child relationship but the proper R.I. constraint is not defined, it is possible (either through faulty application code that uses two separate transactions for an INSERT {or a DELETE} against the two tables OR through erroneous adhoc updates to the data) to have "dangling" child records.
For example, an INSERT into SALES errors out (for lack of space in that tablespace) but the corresponding INSERT into SALES_LINES succeeds and commits, when executed as a separate transaction.
Or an adhoc "data-fix" operation deletes rows from the SALES table without having deleted them from the SALES_LINES table first.

Having UNIQUE Indexes on the two tables will not prevent the occurrence of "parent-less children" !

When querying the SALES_LINES table, it might be necessary to validate that the corresponding SALES row exists. This could be done as a join between the two tables, explicitly included in every query against the SALES_LINES table.

Assuming that the two tables have 100,000 rows (each sale having only 1 line), with proper Unique Indexes, a query for one PROD_ID of 10 different products may execute as :


SQL> -- Query for Total Sales for PROD_ID=5
SQL> select sum(l.quantity), sum(l.quantity*l.price)
2 from sales_lines l, sales s
3 where
4 -- join code is used to validate that the sale_id is legitimate
5 l.sale_id = s.sale_id
6 and l.prod_id=5
7 and l.sale_id between 25000 and 35000
8 /

SUM(L.QUANTITY) SUM(L.QUANTITY*L.PRICE)
--------------- -----------------------
2441786 48892118.9


Execution Plan
----------------------------------------------------------
Plan hash value: 146457679

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 139 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 39 | | |
| 2 | NESTED LOOPS | | 984 | 38376 | 139 (0)| 00:00:02 |
|* 3 | TABLE ACCESS BY INDEX ROWID| SALES_LINES | 984 | 33456 | 139 (0)| 00:00:02 |
|* 4 | INDEX RANGE SCAN | SALES_LINES_UK | 10000 | | 24 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SALES_UK | 1 | 5 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


3 - filter("L"."PROD_ID"=5)
4 - access("L"."SALE_ID">=25000 AND "L"."SALE_ID"<=35000)

5 - access("L"."SALE_ID"="S"."SALE_ID")
filter("S"."SALE_ID"<=35000 AND "S"."SALE_ID">=25000)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
281 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

However, if I were to define the Constraints as :

SQL> REM REM ######################################
SQL> REM Now add the Constraint Definitions !
SQL> alter table SALES add constraint SALES_PK primary key (sale_id);

Table altered.

SQL> alter table SALES_LINES add constraint SALES_LINES_FK foreign key (sale_id) references SALES (sale_id);

Table altered.

SQL> REM REM ######################################

and I re-run the query :

SQL> select sum(l.quantity), sum(l.quantity*l.price)
2 from sales_lines l, sales s
3 where
4 -- join code is used to validate that the sale_id is legitimate
5 l.sale_id = s.sale_id
6 and l.prod_id=5
7 and l.sale_id between 25000 and 35000
8 /

SUM(L.QUANTITY) SUM(L.QUANTITY*L.PRICE)
--------------- -----------------------
2441786 48892118.9


Execution Plan
----------------------------------------------------------
Plan hash value: 2517766180

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 139 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 34 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID| SALES_LINES | 984 | 33456 | 139 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | SALES_LINES_UK | 10000 | | 24 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("L"."PROD_ID"=5)
3 - access("L"."SALE_ID">=25000 AND "L"."SALE_ID"<=35000)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
139 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Oracle can simply eliminate the lookup on the SALES table and do much fewer consistent gets.
The presence of the constraint ensures that every SALE_ID in the (child) SALES_LINES table *does* have corresponding SALE_ID in the (parent) SALES table. Since I am not fetching any columns from the SALES table, the join is now unnecessary and the optimizer (smartly) eliminates the join.

.

UPDATE : See the Optimizer team's blog post on Table (Join) Elimination.

.

.
.

1 comment:

SydOracle said...

Very useful for views that have the child linked to the parent, but where the query against the view only requires the child table to be accessed.