10 September, 2010

Deadlocks

UPDATE : See Deadlock on INSERT in the second post.

A deadlock occurs when two processes contend for the same resources (row locks) when one process holds a lock on one set of resources (call it set "A") and waits for a lock on another set of resources (call it set "B") while *at the same time* the other process holds a lock on set "B" and is waiting to lock set "A".
The set of resources could each be a single or or multiple rows but Oracle may detect the deadlock when attempting to lock a row in a set of rows.
When it detects the deadlock, it sends an error to be returned to the SQL and this causes a statement level rollback and not a transaction rollback. Also, the session isn't killed.
(This was contrary to my earlier understanding that the session, on the error, is killed !).

Here's a demo :
This is session_1 :

23:40:36 SQL> select 'My Session : ' || s.sid, s.serial#, p.pid, p.spid
23:40:36 2 from v$session s, v$process p
23:40:36 3 where s.paddr=p.addr
23:40:36 4 and s.sid = (select distinct sid from v$mystat);

'MYSESSION:'||S.SID SERIAL# PID SPID
----------------------------------------------------- ---------- ---------- ------------
My Session : 143 58 15 4170

23:40:36 SQL>
23:40:36 SQL> pause Press ENTER to proceed .....
Press ENTER to proceed .....

23:40:39 SQL>
23:40:39 SQL> select count(*) from table_1;

COUNT(*)
----------
4

23:40:39 SQL> select col_1, col_2 from table_1 where col_1='Z';

C COL_2
- ----------
Z 26

23:40:39 SQL>
23:40:39 SQL> -- my first set of row/rows that are locked
23:40:39 SQL> update table_1 set col_1 = 'A' where col_1='Z';

1 row updated.

23:40:39 SQL> pause Press ENTER to proceed .....
Press ENTER to proceed .....

23:40:44 SQL>
23:40:44 SQL> -- subsequent operations
23:40:44 SQL> select count(*) from my_obj;

COUNT(*)
----------
50694

23:40:44 SQL>
23:40:44 SQL> select count(*) from table_1;

COUNT(*)
----------
4

23:40:44 SQL> select col_1, col_2 from table_1 where col_1='X';

C COL_2
- ----------
X 24

23:40:44 SQL>
23:40:44 SQL> -- my first set of row/rows that are locked
23:40:44 SQL> update table_1 set col_1 = 'A' where col_1='X';
update table_1 set col_1 = 'A' where col_1='X'
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource


23:40:50 SQL>
23:40:50 SQL>
23:41:13 SQL> commit;

Commit complete.


And this is session_2 :

23:40:40 SQL>
23:40:40 SQL> select 'My Session : ' || s.sid, s.serial#, p.pid, p.spid
23:40:40 2 from v$session s, v$process p
23:40:40 3 where s.paddr=p.addr
23:40:40 4 and s.sid = (select distinct sid from v$mystat);

'MYSESSION:'||S.SID SERIAL# PID SPID
----------------------------------------------------- ---------- ---------- ------------
My Session : 145 58 20 4172

23:40:40 SQL>
23:40:40 SQL> pause Press ENTER to proceed .....
Press ENTER to proceed .....

23:40:42 SQL>
23:40:42 SQL> select count(*) from table_1;

COUNT(*)
----------
4

23:40:42 SQL> select col_1, col_2 from table_1 where col_1='X';

C COL_2
- ----------
X 24

23:40:42 SQL>
23:40:42 SQL> -- my first set of row/rows that are locked
23:40:42 SQL> update table_1 set col_1 = 'B' where col_1='X';

1 row updated.

23:40:42 SQL> pause Press ENTER to proceed .....
Press ENTER to proceed .....

23:40:47 SQL>
23:40:47 SQL> -- subsequent operations
23:40:47 SQL> select count(*) from
23:40:47 2 ( select /*+ NO_MERGE */ * from my_obj_4 union all select * from my_obj_2 );

COUNT(*)
----------
101391

23:40:47 SQL>
23:40:47 SQL> select count(*) from table_1;

COUNT(*)
----------
4

23:40:47 SQL> select col_1, col_2 from table_1 where col_1='X';

no rows selected

23:40:47 SQL>
23:40:47 SQL> -- my second set of row/rows that are locked
23:40:47 SQL> update table_1 set col_1 = 'B' where col_1='Z';

0 rows updated.

23:41:27 SQL>
23:41:27 SQL> commit;

Commit complete.



The data at the beginning was :

C COL_2
- ----------
A 1
B 2
X 24
Z 26


while that at the end was :

C COL_2
- ----------
A 1
B 2
B 24
A 26


Thus the first UPDATE by each session did go through ! (session_2's second UPDATE found 0 rows because session_1 had modified col_1='Z' to col_1='A').

NOTE : The queries on "my_obj", "my_obj_4", "my_obj_2" are to simulate other SQL operations done by each session betwen the time of locking one set of resources (a row in table_1) and requesting another set of resources (another row in table_1). These "operations" may be other queries run by the sessions OR they may be other DML run by the sessions OR they may just be the sessions waiting for the user or application server to issue the next command (in which case, the "performance" of the user or application before it issues the next command may need to be reviewed).
Thus, a deadlock may occur because of performance issues in operations not even related to the row locks. Under normal circumstances, it may so happen, that the two sessions don't attempt these resources at nearly the same time *OR* even that the "other operations" are fast enough such that the two sessions don't end up contending for the resources -- e.g. if session_1's operations on both sets of resources (rows in table_1) complete even before session_2 has taken a lock on it's first set of resources.


The deadlock trace for the first session (SID 143) shows how the two sessions (SID 143 and 145, PIDs 15 and 20 respectively) are contending on the same object (OBJN=57698 being "TABLE_1")) but each holding one row and waiting for another row (both being in the same block 865780 as this is a small table). {That the rows are in the same block is *not* the cause of the deadlock as Oracle does row-level locking using ITL entries.


Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00130000-000001b8 15 143 X 20 145 X
TX-00110006-0000013c 20 145 X 15 143 X
session 143: DID 0001-000F-00000029 session 145: DID 0001-0014-00000019
session 145: DID 0001-0014-00000019 session 143: DID 0001-000F-00000029
Rows waited on:
Session 145: obj - rowid = 0000E162 - AAAOFmAAEAADTX0AAD
(dictionary objn - 57698, file - 4, block - 865780, slot - 3)
Session 143: obj - rowid = 0000E162 - AAAOFmAAEAADTX0AAC
(dictionary objn - 57698, file - 4, block - 865780, slot - 2)
Information on the OTHER waiting sessions:
Session 145:
pid=20 serial=58 audsid=584830 user: 64/HEMANT
O/S info: user: ora10204, term: pts/2, ospid: 4167, machine: linux64
program: sqlplus@linux64 (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update table_1 set col_1 = 'B' where col_1='Z'
End of information on OTHER waiting sessions.

.
.
.

No comments: