11 April, 2009

Incorrect Cardinality Estimate of 1 : Bug 5483301

Oracle Bug# 5483301 is about the optimizer computing an incorrect Cardinality of 1 for an equality predicate not present in a FREQUENCY Histogram.

This test is in 10.2.0.1
In the first half of this test, I create a FREQUENCY Histogram on a column with 99 distinct values ("low_card_column") between 1 and 99. For queries for the values 110, 120 and 2000, all three of which are *not* present in the Histogram, Oracle computes an estimated cardinality of 1.
Note : It wouldn't matter if the value I queried for were within the min and max values either -- as long as it does not exist in the FREQUENCY Histogram, Oracle estimates a cardinality of 1 only.

In the second half of the test, I do NOT create a Histogram on the colum. This time, for the predicate values 110, 120 and 2000, Oracle does come up with more reasonable cardinalities.

Thus, the bug being present in 10.2.0.1 to 10.2.0.3, means that if you create a FREQUENCY Histogram (either deliberately or inadvertently through Oracle's SIZE AUTO behaviour) and later query for a predicate not present in the Histogram (e.g. if the value was inserted after the Statistics were gathered), the cardinality estimate can be very wrong. This results in serious under-costing of queries for such values.
If you happen to have a very large number of rows for such a value, you are seriously out of luck.
You must either rebuild the FREQUENCY Histogram statistics OR drop the Histogram.



SQL>
SQL> REM Bug 5483301 : "QUERY WITH PREDICATE VALUE NON-EXISTENT IN FREQUENCY HISTOGRAM RUNS SLOW"
SQL>
SQL> REM The Bug causes Oracle to estimate a very poor Cardinality of 1 if a frequency histogram exists
SQL> REM on the column but the specific column value does not exist in the table
SQL>
SQL> drop table test_bug_5483301 purge;
Table dropped.

SQL> create table test_bug_5483301
2 (
3 low_card_column number not null,
4 text_column varchar2(20) not null,
5 number_column number not null,
6 date_column date not null
7 )
8 /
Table created.

SQL> alter table test_bug_5483301 nologging;
Table altered.

SQL> -- populate the table with 100 different low_card_column values
SQL> insert /*+ APPEND */ into test_bug_5483301
2 select trunc(dbms_random.value(1,100)),
3 dbms_random.string('X',10),
4 trunc(dbms_random.value(100,80000)),
5 sysdate-365+trunc(dbms_random.value(1,360))
6 from object_ids_table
7 /
2417712 rows created.

SQL> create index test_bug_5483301_lcc_ndx on test_bug_5483301(low_card_column) nologging;
Index created.

SQL> -- Create a FREQUENCY Histogram
SQL> exec dbms_stats.gather_table_stats(user,'TEST_BUG_5483301',estimate_percent=>100,method_opt=>'FOR COLUMNS LOW_CARD_COLUMN SIZE 250',cascade=>TRUE);
PL/SQL procedure successfully completed.

SQL>
SQL> REM Confirm the number and range of low_card_column values
SQL> select count(distinct(low_card_column)) from test_bug_5483301
2 /

COUNT(DISTINCT(LOW_CARD_COLUMN))
--------------------------------
99

SQL>
SQL> select min_low_card_column, max_low_card_column
2 from
3 (select min(low_card_column) min_low_card_column from test_bug_5483301),
4 (select max(low_card_column) max_low_card_column from test_bug_5483301)
5 /

MIN_LOW_CARD_COLUMN MAX_LOW_CARD_COLUMN
------------------- -------------------
1 99

SQL>
SQL> -- insert 8000 rows with low_card_column values that are not in the histogram
SQL> insert /*+ APPEND */ into test_bug_5483301
2 select 110,
3 'ABCDEFGH',
4 82000,
5 sysdate-2
6 from dba_objects
7 where rownum < 8001
8 /
8000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> REM ######################################################################################
SQL> REM Let's check the Card estimates -- when using Literals
SQL>
SQL> -- for a value in the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=10;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24671 | 626K| 1941 (4)| 00:00:24 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 24671 | 626K| 1941 (4)| 00:00:24 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=10)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=10;

COUNT(*)
----------
24671

SQL>
SQL> -- for another value in the possible list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=75;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24594 | 624K| 1941 (4)| 00:00:24 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 24594 | 624K| 1941 (4)| 00:00:24 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=75)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=75;

COUNT(*)
----------
24594

SQL>
SQL> -- for a value present in the table but not in the histogram
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=110;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1793180512

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_BUG_5483301 | 1 | 26 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_BUG_5483301_LCC_NDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOW_CARD_COLUMN"=110)

14 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=110;

COUNT(*)
----------
8000

SQL>
SQL> -- for a value outside the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=120;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1793180512

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_BUG_5483301 | 1 | 26 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_BUG_5483301_LCC_NDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOW_CARD_COLUMN"=120)

14 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=120;

COUNT(*)
----------
0

SQL>
SQL> -- for a value *far* outside the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=2000;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1793180512

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_BUG_5483301 | 1 | 26 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_BUG_5483301_LCC_NDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOW_CARD_COLUMN"=2000)

14 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=2000;

COUNT(*)
----------
0

SQL>
SQL> REM REM REM REM REM REM ####################################################################
SQL> REM REM REM REM REM REM ####################################################################
SQL>
SQL> drop table test_bug_5483301 purge;

Table dropped.

SQL> create table test_bug_5483301
2 (
3 low_card_column number not null,
4 text_column varchar2(20) not null,
5 number_column number not null,
6 date_column date not null
7 )
8 /

Table created.

SQL>
SQL> alter table test_bug_5483301 nologging;

Table altered.

SQL>
SQL> -- populate the table with 100 different low_card_column values
SQL> insert /*+ APPEND */ into test_bug_5483301
2 select trunc(dbms_random.value(1,100)),
3 dbms_random.string('X',10),
4 dbms_random.value(100,80000),
5 sysdate-365+dbms_random.value(1,360)
6 from object_ids_table
7 /

2417712 rows created.

SQL>
SQL> create index test_bug_5483301_lcc_ndx on test_bug_5483301(low_card_column) nologging;

Index created.

SQL>
SQL> -- Create NO Histogram
SQL> exec dbms_stats.gather_table_stats(user,'TEST_BUG_5483301',estimate_percent=>100,method_opt=>'FOR COLUMNS LOW_CARD_COLUMN SIZE 1',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL> REM Confirm the number and range of low_card_column values
SQL> select count(distinct(low_card_column)) from test_bug_5483301
2 /

COUNT(DISTINCT(LOW_CARD_COLUMN))
--------------------------------
99

SQL>
SQL> select min_low_card_column, max_low_card_column
2 from
3 (select min(low_card_column) min_low_card_column from test_bug_5483301),
4 (select max(low_card_column) max_low_card_column from test_bug_5483301)
5 /

MIN_LOW_CARD_COLUMN MAX_LOW_CARD_COLUMN
------------------- -------------------
1 99

SQL>
SQL>
SQL> -- insert 8000 rows with low_card_column values that are not in the histogram
SQL> insert /*+ APPEND */ into test_bug_5483301
2 select 110,
3 'ABCDEFGH',
4 82000,
5 sysdate-2
6 from dba_objects
7 where rownum < 8001
8 /

8000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL>
SQL>
SQL>
SQL> REM ######################################################################################
SQL> REM Let's check the Card estimates -- when using Literals
SQL>
SQL> -- for a value in the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=10;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24421 | 667K| 2950 (3)| 00:00:36 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 24421 | 667K| 2950 (3)| 00:00:36 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=10)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=10;

COUNT(*)
----------
24252

SQL>
SQL> -- for another value in the possible list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=75;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24421 | 667K| 2950 (3)| 00:00:36 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 24421 | 667K| 2950 (3)| 00:00:36 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=75)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=75;

COUNT(*)
----------
24472

SQL>
SQL> -- for a value present in the table but not in the histogram
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=110;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21680 | 592K| 2950 (3)| 00:00:36 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 21680 | 592K| 2950 (3)| 00:00:36 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=110)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=110;

COUNT(*)
----------
8000

SQL>
SQL> -- for a value outside the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=120;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2790200580

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19188 | 524K| 2950 (3)| 00:00:36 |
|* 1 | TABLE ACCESS FULL| TEST_BUG_5483301 | 19188 | 524K| 2950 (3)| 00:00:36 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LOW_CARD_COLUMN"=120)

13 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=120;

COUNT(*)
----------
0

SQL>
SQL> -- for a value *far* outside the known list of values (by frequency histogram)
SQL> explain plan for select text_column, number_column from test_bug_5483301 where low_card_column=2000;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1793180512

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_BUG_5483301 | 1 | 28 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_BUG_5483301_LCC_NDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOW_CARD_COLUMN"=2000)

14 rows selected.

SQL> select count(*) from test_bug_5483301 where low_card_column=2000;

COUNT(*)
----------
0

SQL>



More Notes : The Patch for 5483301 can introduce bug 6082745.
The 5483301 fix is to estimate a cardinality of 0.5 times the least popular value -- which can still be very wrong if you have a new value, not present in the Histogram, but with a very large number of rows !

No comments: