26 February, 2015

Parallel Execution -- 1 The PARALLEL Hint and AutoDoP

The behaviour of the PARALLEL Hint has changed subtly but significantly in 11.2.  From the documentation :
Beginning with Oracle Database 11g Release 2 (11.2.0.1), the PARALLEL and NO_PARALLEL hints are statement-level hints and supersede the earlier object-level hints .......... If you omitinteger, then the database computes the degree of parallelism.

Further down, the documentation states :
This hint overrides the value of the PARALLEL_DEGREE_POLICY initialization parameter. 
and 
PARALLEL: The statement always is run parallel, and the database computes the degree of parallelism, which can be 2 or greater.

It is important to note that the Statement Level PARALLEL Hint *overrides* the PARALLEL_DEGREE_POLICY.
So, even if PARALLEL_DEGREE_POLICY is set to MANUAL, implying that automatic degree of parallelism is disabled, the PARALLEL Hint, itself, allows Oracle to auto-compute a DoP.

What further complicates understanding of the behaviour is a reading of Oracle Support Note 1269321.1 that implies that if I/O calibration statistics are missing, Oracle does not use the automatic degree of parallelism feature.
So, one would assume that if I/O Calibration is not done, with the "automatic degree of parallelism" feature not being used, the PARALLEL Hint would not compute any Auto DoP !

Let's run a simple test case :

HEMANT>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

HEMANT>
HEMANT>show parameter parallel;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     40
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     16
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0
HEMANT>                     
HEMANT>select degree from user_tables where table_name = 'LARGE_TABLE';

DEGREE
----------------------------------------
         4

HEMANT>select /*+ PARALLEL */ count(*) from LARGE_TABLE;

  COUNT(*)
----------
   4802944

HEMANT>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  47m7qyrj6uzqn, child number 0
-------------------------------------
select /*+ PARALLEL */ count(*) from LARGE_TABLE

Plan hash value: 2085386270

-----------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             |       | 10488 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE        |             |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |             |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000    |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |             |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |             |  4802K| 10488   (1)| 00:02:06 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| LARGE_TABLE |  4802K| 10488   (1)| 00:02:06 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing


27 rows selected.

HEMANT>select px_servers_executions from v$sqlstats where sql_id='47m7qyrj6uzqn';

PX_SERVERS_EXECUTIONS
---------------------
                    2

HEMANT>
HEMANT>select * from dba_rsrc_io_calibrate;

no rows selected

HEMANT>

Apparently, I/O Calibration statistics are not present ("are missing"). And yet, Oracle chose to use 2 PX Servers (not 4, not 1) for the query.  Isn't this confusing ?  Is AutoDoP used or is it not used ?

Let's make a change somewhere (else ?)

HEMANT>show parameter cpu_count

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     1

HEMANT>

Let's try on a server with more CPUs.  I reconfigure the same VM to run with 4 "CPUs" (cores) and restart the VM and database instance.

[oracle@localhost ~]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.2.0 Production on Thu Feb 26 23:18:47 2015

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SYS>show parameter cpu

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     4
parallel_threads_per_cpu             integer     2
resource_manager_cpu_allocation      integer     4
SYS>show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     135
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     64
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0
SYS>

We can now see that CPU_COUNT, PARALLEL_MAX_SERVERS and PARALLEL_SERVERS_TARGET have all gone up.

SYS>connect hemant/hemant
Connected.
HEMANT>set serveroutput off
HEMANT>select degree from user_tables where table_name = 'LARGE_TABLE';

DEGREE
----------------------------------------
         4

HEMANT>select /*+ PARALLEL */ count(*) from LARGE_TABLE; 

  COUNT(*)
----------
   4802944

HEMANT>select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  47m7qyrj6uzqn, child number 0
-------------------------------------
select /*+ PARALLEL */ count(*) from LARGE_TABLE

Plan hash value: 2085386270

-----------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             |       |  2622 (100)|          |        |      |      |
|   1 |  SORT AGGREGATE        |             |     1 |            |          |        |      |      |
|   2 |   PX COORDINATOR       |             |       |            |          |        |      |      |
|   3 |    PX SEND QC (RANDOM) | :TQ10000    |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |             |     1 |            |          |  Q1,00 | PCWP |      |
|   5 |      PX BLOCK ITERATOR |             |  4802K|  2622   (1)| 00:00:32 |  Q1,00 | PCWC |      |
|*  6 |       TABLE ACCESS FULL| LARGE_TABLE |  4802K|  2622   (1)| 00:00:32 |  Q1,00 | PCWP |      |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing


27 rows selected.

HEMANT>
HEMANT>select px_servers_executions from v$sqlstats where sql_id='47m7qyrj6uzqn';

PX_SERVERS_EXECUTIONS
---------------------
                    8

HEMANT>

Aaha !  The same SQL statement, the same SQL_ID, the same Execution Plan (Plan Hash Value) and Oracle  chose to use 8  (not 1, not 4, not 2) PX servers for the query !

So, there IS some way that AutoDoP is being used even when I/O calibration statistics are missing. Is this AutoDoP simply a function CPU_COUNT x PARALLEL_THREADS_PER_CPU ?

UPDATE :  Herlindo QV also confirmed to me on Twitter : "it seems that way and in RAC the formula looks like (CPU_COUNT x PARALLEL_THREADS_PER_CPU) x RAC nodes"  and, later, when I ask him to check with PARALLEL_FORCE_LOCAL=TRUE : "right, restricts parallel to just one instance. The tricky part is how to choose the best DOP for each unique scenario"
.
.
.

No comments: