18 October, 2007

Flush Buffer_Cache -- when Tracing doesn't show anything happening

I believe that the "ALTER SYSTEM FLUSH BUFFER_CACHE" command is very sparingly used. After all, who wants to flush out active (or "hot") blocks and have to incur physical reads all over again ? I guess the expectation is that this command is used in Benchmarking only.
However, recently, I was able to use this command to diagnose a seemingly hanging query. V$SESSION_WAIT wasn't reporting any new wait (it was reporting only the the previous wait - 'SQL*Net more data from client' - incurred in the session before the query began executing), a 10046 level 8 trace showed nothing after the Parse# command. That left us puzzled. Why wasn't the 10046 trace showing any activity ?
I then started querying V$SESSSTAT and found that "consistent gets" was slowly increasing. An explain plan showed Nested Loop joins. The two tables and indexes involved were very small.
Aah ! "very small tables", I thought. Most likely they are in the Buffer Cache.
I issued an ALTER SYSTEM FLUSH BUFFER_CACHE and all of sudden we saw a few dozen 'db file sequential read' entries in the 10046 trace file. After that short spurt, the trace file went to "sleep". V$SESSION_WAIT now showed that the last wait was 'db file sequential read'.

One case where the FLUSH BUFFER_CACHE was the right thing to do. No one grudged having to reload the SGA --- after we had identified the bad SQL and spent a few hours on it, we tuned it down so that a 2 hour batch job completed in 3 minutes.
We finally "tuned" the INSERT..AS SELECT.. by using "_complex_view_merging=FALSE", which we then changed to "NO_MERGE" Hints in the individual subselects inside the query.

What had held us up for a short while were :
a) 'SQL*Net more data from client' in V$SESSION_WAIT with a continously increasing SECONDS_IN_WAIT seeemed to indicate that there was a network problem -- this was a wrong interpretation as the STATE was actually "WAITED SHORT TIME" so the continously increasing SECONDS_IN_WAIT was only a "counter" since the last wait had really occurred [for another discussion on how we might misread V$SESSION_WAIT, see Tanel Poder's blog entry]
b) That 10046 trace file wasn't indicating any activity -- there were no entries after the PARSE call on that particular SQL.

No comments: