SQL performance degrade due to IO issue not captured by ASH

Oracle 12.2.0.1 RAC

Question:

A specific SQL performance degraded significantly due to increased I/O latency.
This SQL contains WINDOW SORT row set in execution plan, which means it would be impacted by slow I/O.
Nevertheless, why all samples in v$active_session_history is on CPU?

select SQL_PLAN_LINE_ID, SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, event, count(*)
from ash
where sql_id = 'gtjtx119v1ft0'
and SQL_EXEC_ID = 17488974
group by SQL_PLAN_LINE_ID, SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, event
order by count(*);
                                
SQL_PLAN_LINE_ID SQL_PLAN_OPERATION     SQL_PLAN_OPTIONS       EVENT                     COUNT(*)
---------------- ---------------------- ---------------------- ----------------------- ----------
              16 WINDOW                 SORT PUSHED RANK       direct path read temp            1
               3 HASH JOIN              RIGHT OUTER                                             1
              17 HASH JOIN                                                                      9
              16 WINDOW                 SORT PUSHED RANK                                      109  <<< Almost ALL on CPU

We can create a large table and check out whether the wait event ‘direct path write temp’ and ‘direct path read temp’ would be captured.

— setup

create table caoff.test_temp(id number);

declare
v_iter number;
v_id number;
begin
  for v_iter in 1..1000000 LOOP
    for v_id in 1..1000 LOOP
     insert into caoff.test_temp values(v_id);
    end LOOP;
    commit;
  END LOOP;
end;
/

This is the 10046 trace:

select id, row_number() over(partition by id order by id )
from caoff.test_temp

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     5340     12.65      12.77        459      88655          8       80086
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     5343     12.66      12.78        459      88656          8       80086

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
     80086      80086      80086  WINDOW SORT (cr=88655 pr=459 pw=83064 time=12708719 us starts=1 cost=222989 size=543218182 card=41786014)
  58438000   58438000   58438000   TABLE ACCESS FULL TEST_TEMP (cr=88655 pr=0 pw=0 time=4518679 us starts=1 cost=24224 size=543218182 card=41786014)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message to client                    5341        0.00          0.00
  PGA memory operation                         7053        0.00          0.04
  acknowledge over PGA limit                      1        0.00          0.00
  direct path write temp                          1        0.00          0.00  <<< Did wait on TEMP IO
  direct path read temp                           9        0.00          0.00  <<< Did wait on TEMP IO
  SQL*Net message from client                  5341        6.90          7.35
  SQL*Net break/reset to client                   1        0.00          0.00

The direct path I/O wait events were NOT captured by ASH.

SQL> select event, count(*)
  2  from v$active_session_history
  3  where sql_id = 'cnu2r73sc7yyj'
  4  group by event
  5  order by count(*);

EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
                                                                         49  <<< All event on CPU

Question:

Why all samples in v$active_session_history is on CPU?

Answer:

The SQL was impacted by slow I/O. But since the wait event only lasts for less than 10ms most of the time, it would be hard for ASH to capture this event.

Furthermore, in this case, ASH information tend be misleading because it omits the critical I/O issue and people may believe that the SQL was solely conducting logical read to burn CPU.

Another evidence that this issue was not because of high concurrency is that we did NOT see wait event ‘latch: cache buffer chains‘, which is often accompanied by high concurrency issue.

Question: Why is Elapsed time > CPU time + IO wait?

Answer:

DB CPU is Reported as Larger than DB TIME in AWR Report and Trace (Doc ID 1911984.1)
Unaccounted Time in 10046 File on AIX SMT4 Platform when Comparing Elapsed and CPU Time (Doc ID 1963791.1)

This is a known behavior with SMT-4 POWER processors. When SMT4 is enabled, you are taking advantage of SMT4 for better parallel throughput, but the behavior will be shown.

Ignore the difference between “elapsed time” & “cpu time + wait time”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close