18 October, 2007

Flush Buffer_Cache -- when Tracing doesn't show anything happening

I believe that the "ALTER SYSTEM FLUSH BUFFER_CACHE" command is very sparingly used. After all, who wants to flush out active (or "hot") blocks and have to incur physical reads all over again ? I guess the expectation is that this command is used in Benchmarking only.
However, recently, I was able to use this command to diagnose a seemingly hanging query. V$SESSION_WAIT wasn't reporting any new wait (it was reporting only the the previous wait - 'SQL*Net more data from client' - incurred in the session before the query began executing), a 10046 level 8 trace showed nothing after the Parse# command. That left us puzzled. Why wasn't the 10046 trace showing any activity ?
I then started querying V$SESSSTAT and found that "consistent gets" was slowly increasing. An explain plan showed Nested Loop joins. The two tables and indexes involved were very small.
Aah ! "very small tables", I thought. Most likely they are in the Buffer Cache.
I issued an ALTER SYSTEM FLUSH BUFFER_CACHE and all of sudden we saw a few dozen 'db file sequential read' entries in the 10046 trace file. After that short spurt, the trace file went to "sleep". V$SESSION_WAIT now showed that the last wait was 'db file sequential read'.

One case where the FLUSH BUFFER_CACHE was the right thing to do. No one grudged having to reload the SGA --- after we had identified the bad SQL and spent a few hours on it, we tuned it down so that a 2 hour batch job completed in 3 minutes.
We finally "tuned" the INSERT..AS SELECT.. by using "_complex_view_merging=FALSE", which we then changed to "NO_MERGE" Hints in the individual subselects inside the query.

What had held us up for a short while were :
a) 'SQL*Net more data from client' in V$SESSION_WAIT with a continously increasing SECONDS_IN_WAIT seeemed to indicate that there was a network problem -- this was a wrong interpretation as the STATE was actually "WAITED SHORT TIME" so the continously increasing SECONDS_IN_WAIT was only a "counter" since the last wait had really occurred [for another discussion on how we might misread V$SESSION_WAIT, see Tanel Poder's blog entry]
b) That 10046 trace file wasn't indicating any activity -- there were no entries after the PARSE call on that particular SQL.

13 October, 2007

Inserts waiting on Locks ? Inserts holding Locks ?

When could we have a situation where a user says that his transaction is "slow" and it is merely inserting rows into a table -- an insert which used to complete in less than a minute but now has been "hanging" for a long time ? (He runs an "INSERT INTO TABLE A SELECT * FROM ....") You'd think that if the INSERT uses the SELECT query to build the data it is most likely the SELECT that is slow. You think that you need to tune the SELECT. (and worse where the SELECT is based on a distributed query -- you try to tune joins across databases).

But stop and think. If this INSERT was always much faster ("till yesterday !" he says), and is "hanging" now, you should look at where the user's session is Waiting.
Here's a case study (which I developed and tested on my Home PC) where an INSERT might wait ("forever") because it is actually attempting to insert a Duplicate Value (when there is a Unique Index) and is waiting on a "lock". Inserts should not need to wait on Locks. Moreover, if the duplicate value exists in the table, the INSERT should immediately fail with an ORA-0001 error. Yet we have a case where the INSERT waits on a lock !

On the flip side, an Insert does not need to "hold a lock", you expect because it is not updating data that someone else might attempt to update, it is after all "creating new data that no one else knows yet and, therefore, no one else needs to reference yet !".
In the case study, you see how Session 43 seems to hold a Lock and Session 45 does actually wait on a Lock. I've also suggested how you can extend the case study by creating a session where the second INSERT is based on a SELECT and your (or your DBA) would attempt to tune the SELECT if he doesn't look at session waits.