30 September, 2011

Another example of GATHER_TABLE_STATS and a Histogram

In response to a forum thread "Why not selecting index path .", I ran up this example :
SQL> select banner from v$version where rownum=1;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

SQL>
SQL> drop table test_index_count5 purge;

Table dropped.

SQL> alter system flush shared_pool;

System altered.

SQL> alter system flush shared_pool;

System altered.

SQL> create table test_index_count5 (a number, b varchar2(15));

Table created.

SQL>
SQL> insert into test_index_count5
2 select 2, 'A String'
3 from dual
4 connect by level <= 1000000; 1000000 rows created. SQL>
SQL> insert into test_index_count5
2 select 3, 'Another String'
3 from dual
4 connect by level <= 2; 2 rows created. SQL>
SQL> create index ind_a_tic5 on TEST_INDEX_COUNT5(a);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,tabname=>'TEST_INDEX_COUNT5',estimate_percent=>100);

PL/SQL procedure successfully completed.

SQL>
SQL> set serverout off
SQL> select /*+ gather_plan_statistics */ * from test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID 2qu5fmh95yc4c, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from test_index_count5 where a=3

Plan hash value: 1766596593

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.11 | 2393 |
|* 1 | TABLE ACCESS FULL| TEST_INDEX_COUNT5 | 1 | 500K| 2 |00:00:00.11 | 2393 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("A"=3)


18 rows selected.

SQL> set serverout on
SQL>
SQL> -- execute the same query in different forms a few times
SQL> select * from test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * FROM test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> -- execute dummy ddl to force a hard parse for subsequent queries
SQL> comment on table TEST_INDEX_COUNT5 is 'TEST_INDEX_COUNT5';

Comment created.

SQL> SELECT * from test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * from TEST_INDEX_COUNT5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * from test_index_count5 WHERE a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL>
SQL> -- check col_usage
SQL> select * from sys.col_usage$
2 where obj# = (select object_id from user_objects where object_name = 'TEST_INDEX_COUNT5');

no rows selected

SQL> -- Remember ! Col_usage is updated only on flush database monitoring
SQL> -- normally this is done periodically by the background
SQL> -- but we'll forcefully invoke it
SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL> select * from test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * from sys.col_usage$
2 where obj# = (select object_id from user_objects where object_name = 'TEST_INDEX_COUNT5');

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ----------
TIMESTAMP
---------
85617 1 1 0 0 0 0 0
30-SEP-11


SQL>
SQL> -- re gather_stats
SQL> exec dbms_stats.gather_table_stats(user,tabname=>'TEST_INDEX_COUNT5',estimate_percent=>100);

PL/SQL procedure successfully completed.

SQL>
SQL> -- now execute the query
SQL> set serveroutput off
SQL> select /*+ gather_plan_statistics */ * from test_index_count5 where a=3;

A B
---------- ---------------
3 Another String
3 Another String

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID 2qu5fmh95yc4c, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from test_index_count5 where a=3

Plan hash value: 4117659660

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | 6 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_INDEX_COUNT5 | 1 | 2 | 2 |00:00:00.01 | 6 |
|* 2 | INDEX RANGE SCAN | IND_A_TIC5 | 1 | 2 | 2 |00:00:00.01 | 4 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A"=3)


19 rows selected.

SQL> set serveroutput on
SQL>
SQL> select * from user_tab_histograms where table_name = 'TEST_INDEX_COUNT5' and column_name = 'A';

TABLE_NAME
------------------------------
COLUMN_NAME
--------------------------------------------------------------------------------------------------------------
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
ENDPOINT_ACTUAL_VALUE
--------------------------------------------------------------------------------------------------------------
TEST_INDEX_COUNT5
A
1000000 2


TEST_INDEX_COUNT5
A
1000002 3



SQL>


The test cases in that forum thread were differently using 100,000 rows and 1million rows. As Tom Kyte observed in his response when the same test case was posted to him (see the link in the forums thread) : In fact, you probably don't even need the method_opt, just the running of the queries should be enough (but might not be in this truly massively skewed example - we might not sample 2 at all - it might not get picked up) .

Whenever I ran the 1million row test case, Oracle didn't create the right histogram until I explicitly added ESTIMATE_PERCENT 100.

Remember that when you have very high data skew, sampling may not be able to identify the skew. The greater the skew, the larger should your sampling be to identify the skew.
.
.
.

1 comment:

Javin @ unix sort command said...
This comment has been removed by a blog administrator.