m6米乐安卓版下载-米乐app官网下载
暂无图片
6

性能调优:sql执行计划改变因为height balanced histogram 的popular value -m6米乐安卓版下载

原创 weizhao.zhang (anbob) 2023-06-20
892

最近有个客户咨询,他们一个系统有个sql在凌晨1点左右执行计划突然变差了,数据库为oracle 11.2.0.4 rac, 从awr看数据库该时段实例级几乎空闲,上线很久的业务,问题时间点无人为操作,sql特征是查询一个分区表,2个列上查询条件,并不包含分区键列, 其中有一个列使用了绑定变量,执行计划有原来使用绑定变量列的索引改为全表分区扫描,直到白天10点以后人为收集了统计信息恢复正常。 简单记录如下

awr 问题时间点负载


snap idsnap timesessionscursors/sessioninstances
begin snap:7073406-may-23 01:00:091105.62
end snap:7073506-may-23 01:30:121135.82
elapsed:
30.06 (mins)


db time:
14.19 (mins)

sql执行计划历史

从sqlhc中可以获取以下信息

historical sql statistics – delta (dba_hist_sqlstat)

performance metrics of execution plans of 4ruhsafdgfj0x.
this section includes data captured by awr. if this is a stand-by read-only database then the awr information below is from the primary database.

 

#snap
id
snaphotinst
id
plan hvvers
cnt
execsfetchloadsinvalparse
calls
buffer
gets
disk
reads
direct
writes
rows
proc
elapsed
time
(secs)
cpu
time
(secs)
io
time
(secs)
conc
time
(secs)
appl
time
(secs)
clus
time
(secs)
plsql
time
(secs)
java
time
(secs)
optimizer
mode
costopt env hvparsing
schema
name
moduleactionprofile
42704462023-04-30/01:00:302142956168414300446610650410330.1312.55420.1200.0000.0008.1510.0000.000all_rows138524642354766usrjdbc thin client
43704472023-04-30/01:30:032142956168414134140041346510232557430414807.68067.347482.3460.0000.000283.3100.0000.000all_rows138524642354766usrjdbc thin client
44704942023-05-01/01:00:122142956168414300446651922970312.5781.41010.1900.0000.0001.1210.0000.000all_rows138524642354766usrjdbc thin client
45704952023-05-01/01:30:19214295616841413414004134652526840650414135.00920.47595.3210.0000.00022.2110.0000.000all_rows138524642354766usrjdbc thin client
46705422023-05-02/01:00:401142956168414300447333861520335.5165.87125.8370.0000.0005.4430.0000.000all_rows138524642354766usrjdbc thin client
47705432023-05-02/01:30:441142956168413693700036946446575238410370876.644156.018658.9110.0000.000117.9550.0000.000all_rows138524642354766usrjdbc thin client
48705902023-05-03/01:00:042142956168411000190420729003.1970.2462.9040.0000.0000.0680.0000.000all_rows138524642354766usrjdbc thin client
49705912023-05-03/01:30:172142956168414164170041650040513169800417828.91863.491553.7420.0000.000234.3270.0000.000all_rows138524642354766usrjdbc thin client
50706382023-05-04/01:00:281142956168414300445561455480327.5185.16619.1820.0000.0004.4710.0000.000all_rows135832642354766usrjdbc thin client
51706392023-05-04/01:30:391142956168414134140041346895752937730414787.993149.052526.2410.0000.000167.8680.0000.000all_rows135832642354766usrjdbc thin client
52706862023-05-05/01:00:05214295616841430044193941871034.7131.2303.2960.0000.0000.2490.0000.000all_rows135832642354766usrjdbc thin client
53706872023-05-05/01:30:122142956168414134140041347366216584190414900.71965.645732.5010.0000.000125.7090.0000.000all_rows135832642354766usrjdbc thin client
54707342023-05-06/01:00:09114295616841100011089090009.6983.6490.0000.0000.0007.5810.0000.000all_rows135832642354766usrjdbc thin client
55707352023-05-06/01:30:12114295616841560055025204137106129.67415.249104.6110.0000.00015.5270.0000.000all_rows135832642354766usrjdbc thin client
56707352023-05-06/01:30:122163467799312100250115702053940011669.041110.8731417.6070.0000.000172.8500.0000.000all_rows740643642354766usrjdbc thin client
57707362023-05-06/02:00:192163467799314400488147023512366041804.871187.6161368.0720.0000.000303.6510.0000.000all_rows740643642354766usrjdbc thin client
58707372023-05-06/02:30:3221634677993155005135105935394886051812.430291.5561141.1910.0000.000464.7780.0000.000all_rows740643642354766usrjdbc thin client
59707382023-05-06/03:00:082163467799312200254969862193826021776.219121.2691503.1330.0000.000187.3830.0000.000all_rows740643642354766usrjdbc thin client

note:

从以上可以确认是在2023-05-06/01:30前从实例2开始执行的sql执行计划发生改变,逻辑读和响应时间翻了近10倍。平时sql的执行次数并不多,so,你懂得

查看两个执行计划

inst: 2   child: 0    plan hash value: 1429561684
                      --------------------------------------------------------------------------------------------------------------------------
                      | id  | operation                           | name              | e-rows |e-bytes| cost (%cpu)| e-time   | pstart| pstop |
                      --------------------------------------------------------------------------------------------------------------------------
                      |   0 | select statement                    |                   |        |       |  1058 (100)|          |       |       |
                      |   1 |  sort aggregate                     |                   |      1 |    27 |            |          |       |       |
                      |*  2 |   table access by global index rowid| tab_qwertyxxxxxx  |    112 |  3024 |  1058   (0)| 00:00:13 | rowid | rowid |
                      |*  3 |    index range scan                 | inx_extxxx3       |   3634 |       |    19   (0)| 00:00:01 |       |       |
                      --------------------------------------------------------------------------------------------------------------------------
                      query block name / object alias (identified by operation id):
                      -------------------------------------------------------------
                         1 - sel$f5bb74e1
                         2 - sel$f5bb74e1 / o@sel$2
                         3 - sel$f5bb74e1 / o@sel$2
                      outline data
                      -------------
                        /* 
                            begin_outline_data
                            ignore_optim_embedded_hints
                            optimizer_features_enable('11.2.0.4')
                            db_version('11.2.0.4')
                            opt_param('_b_tree_bitmap_plans' 'false')
                            opt_param('_bloom_filter_enabled' 'false')
                            opt_param('_optimizer_extended_cursor_sharing' 'none')
                            opt_param('_optimizer_extended_cursor_sharing_rel' 'none')
                            opt_param('_optimizer_adaptive_cursor_sharing' 'false')
                            opt_param('_optimizer_use_feedback' 'false')
                            all_rows
                            outline_leaf(@"sel$f5bb74e1")
                            merge(@"sel$2")
                            outline(@"sel$1")
                            outline(@"sel$2")
                            index_rs_asc(@"sel$f5bb74e1" "o"@"sel$2" ("tab_qwertyxxxxxx"."extend3"))
                            end_outline_data
                        */
                      peeked binds (identified by position):
                      --------------------------------------
                         1 - (varchar2(30), csid=852): '221125'
                    predicate information (identified by operation id):
                      ---------------------------------------------------
                         2 - filter(("o"."queryxxxx">='2023/03/01 00:00:00' and "o"."queryxxxx"<='2023/03/31 00:00:00'))
                         3 - access("o"."exxxx3"=:1)
plan hash value: 1634677993
----------------------------------------------------------------------------------------------------------
| id  | operation            | name              | rows  | bytes | cost (%cpu)| time     | pstart| pstop |
----------------------------------------------------------------------------------------------------------
|   0 | select statement     |                   |       |       |   740k(100)|          |       |       |
|   1 |  sort aggregate      |                   |     1 |    27 |            |          |       |       |
|   2 |   partition range all|                   | 86226 |  2273k|   740k  (1)| 02:28:08 |     1 |     6 |
|   3 |    table access full | tab_qwertyxxxxxx  | 86226 |  2273k|   740k  (1)| 02:28:08 |     1 |     6 |
----------------------------------------------------------------------------------------------------------
query block name / object alias (identified by operation id):
-------------------------------------------------------------
   1 - sel$f5bb74e1
   3 - sel$f5bb74e1 / o@sel$2
outline data
-------------
  /* 
      begin_outline_data
      ignore_optim_embedded_hints
      optimizer_features_enable('11.2.0.4')
      db_version('11.2.0.4')
      opt_param('_b_tree_bitmap_plans' 'false')
      opt_param('_bloom_filter_enabled' 'false')
      opt_param('_optimizer_extended_cursor_sharing' 'none')
      opt_param('_optimizer_extended_cursor_sharing_rel' 'none')
      opt_param('_optimizer_adaptive_cursor_sharing' 'false')
      opt_param('_optimizer_use_feedback' 'false')
      all_rows
      outline_leaf(@"sel$f5bb74e1")
      merge(@"sel$2")
      outline(@"sel$1")
      outline(@"sel$2")
      full(@"sel$f5bb74e1" "o"@"sel$2")
      end_outline_data
  */
peeked binds (identified by position):
--------------------------------------
   1 - :1 (varchar2(30), csid=852): '001358'

note:
从上面我们可以看到确实有两个执行计划,同时当前是禁用了sql acs,但bind peek功能启用,记录下两个变量值及对应的两个执行计划。

通常首先需要排除问题时间是否表有ddl,或统计信息改变,因为无人为操作且从last_ddl_time可以排除,下面看统计信息历史。

统计信息历史版本

tables statistics versions

#table nameownerversion typesave timelast analyzednum rowssample
size
percblocksavg
row
len
1tab_qwertyxxxxxxanbobcurrent
06-may-2023 10:38:565887062858870628100.02734209319
2tab_qwertyxxxxxxanbobhistory06-may-23 10.38.57.053620 am 08:0003-may-2023 19:33:025875872458758724100.02725002319
3tab_qwertyxxxxxxanbobhistory03-may-23 07.33.02.744252 pm 08:0029-apr-2023 19:33:365858049758580497100.02716818319
4tab_qwertyxxxxxxanbobhistory29-apr-23 07.33.37.151765 pm 08:0026-apr-2023 19:37:515849319058493190100.02716818319
5tab_qwertyxxxxxxanbobhistory26-apr-23 07.37.51.237423 pm 08:0022-apr-2023 19:33:595831683158316831100.02708634319
6tab_qwertyxxxxxxanbobhistory22-apr-23 07.33.59.315324 pm 08:0019-apr-2023 19:35:505815214658152146100.02700450319
7tab_qwertyxxxxxxanbobhistory19-apr-23 07.35.50.262460 pm 08:0015-apr-2023 19:31:175796451457964514100.02692266319
8tab_qwertyxxxxxxanbobhistory15-apr-23 07.31.17.426153 pm 08:0012-apr-2023 19:34:275781691557816915100.02684082319
9tab_qwertyxxxxxxanbobhistory12-apr-23 07.34.27.186380 pm 08:0008-apr-2023 19:30:395760313757603137100.02675898319
10tab_qwertyxxxxxxanbobhistory08-apr-23 07.30.39.822771 pm 08:0005-apr-2023 19:30:155751438157514381100.02667714319
11tab_qwertyxxxxxxanbobhistory05-apr-23 07.30.15.853808 pm 08:0001-apr-2023 19:30:335691127056911270100.02643162319

 

indexes statistics versions: tab_qwertyuiopas (usr)

#index nameownerversion typesave timelast analyzednum
rows
sample
size
percdistinct
keys
blevelleaf
blocks
avg
leaf
blocks
per key
avg
data
blocks
per key
clustering
factor
67inx_extxxx3anbobcurrent
2023-05-06/10:39:23601722852681490.499332620982631731817197018
68inx_extxxx3anbobhistory06-may-23 10.39.23.293618 am 08:002023-05-03/19:33:30590060852629520.499232603022621736317224844
69inx_extxxx3anbobhistory03-may-23 07.33.30.522839 pm 08:002023-04-29/19:34:04554475712470940.499232421262441670316570273
70inx_extxxx3anbobhistory29-apr-23 07.34.04.044406 pm 08:002023-04-26/19:38:18551580972458040.499132383112401640316256115
71inx_extxxx3anbobhistory26-apr-23 07.38.18.026010 pm 08:002023-04-22/19:34:25579606122582930.499032524492541723217060135
72inx_extxxx3anbobhistory22-apr-23 07.34.25.739328 pm 08:002023-04-19/19:36:18558826802490330.498832407802431667816478718
73inx_extxxx3anbobhistory19-apr-23 07.36.18.682023 pm 08:002023-04-15/19:31:44599537212671750.498732620982651805417819949
74inx_extxxx3anbobhistory15-apr-23 07.31.44.415922 pm 08:002023-04-12/19:34:53557897792486190.498632423512451665816424862
75inx_extxxx3anbobhistory12-apr-23 07.34.53.972449 pm 08:002023-04-08/19:31:05540010972406480.498632380872411626116033736
76inx_extxxx3anbobhistory08-apr-23 07.31.05.861133 pm 08:002023-04-05/19:30:43568855182535020.498632434732461713616896324
77inx_extxxx3anbobhistory05-apr-23 07.30.43.739025 pm 08:002023-04-01/19:30:58563227262509940.498532436972471676816516641


 

note:

注意表和索引该时间段都没有自动收集。  因为上面有绑定变量窥探到的变量值,那我们把全表扫时的变量带进去,确认真的是收集统计信息修正了吗?

10053 trace

当我们带入001358变量值后,还可以复现全表扫描,那到这里可能有人想到了问题点。下面直接附上10053 trace file部分内容。

sql> connect / as sysdba
sql> oradebug setmypid
sql> oradebug unlimit
sql> oradebug event 10053 trace name context forever, level 1
sql> explain plan for  ...enter your query here...
sql> oradebug event 10053 trace name context off
sql> oradebug tracefile_name
-- trace file
*******************************************
peeked values of the binds in sql statement
*******************************************
final query after transformations:******* unparsed query is *******
select count(*) "total_count" from "usr"."tab_qwertyxxxxxx" "o" where "o"."queryxxx">='2023/03/01 00:00:00' 
and "o"."queryxxx"<='2023/03/31 00:00:00' and "o"."exxxx3"='001358' and '2023/03/31 00:00:00'>='2023/03/01 00:00:00'
kkoqbc: optimizing query block sel$f5bb74e1 (#0)
***************************************
base statistical information
***********************
table stats::
  table: tab_qwertyxxxxxxalias:  o  (using composite stats)
    #rows: 58992346  #blks:  2734209  avgrowlen:  319.00  chaincnt:  0.00
index stats::
 ...
  index: inx_extxxx3  col#: 40
    lvls: 3  #lb: 251775  #dk: 994  lb/k: 253.00  db/k: 17075.00  cluf: 16973517.00
  index: inx_queryxixx  col#: 18
    lvls: 3  #lb: 483492  #dk: 31264768  lb/k: 1.00  db/k: 1.00  cluf: 47778440.00
...
  index: pk_iqueryhistory  col#: 1
    lvls: 3  #lb: 407799  #dk: 56939871  lb/k: 1.00  db/k: 1.00  cluf: 51116553.00
access path analysis for tab_qwertyuiopas
***************************************
single table access path 
  single table cardinality estimation for tab_qwertyuiopas[o] 
  column (#40): 
    newdensity:0.000057, olddensity:0.002967 bktcnt:254, popbktcnt:240, popvalcnt:24, ndv:994
  column (#40): exxxx3(
    avglen: 7 ndv: 994 nulls: 585162 density: 0.000057
    histogram: htbal  #bkts: 254  uncompbkts: 254  endptvals: 38
  column (#18): 
    newdensity:0.000000, olddensity:0.000000 bktcnt:254, popbktcnt:0, popvalcnt:0, ndv:31264768
  column (#18): queryxxxx(
    avglen: 20 ndv: 31264768 nulls: 0 density: 0.000000
    histogram: htbal  #bkts: 254  uncompbkts: 254  endptvals: 255
  table: tab_qwertxxxxas  alias: o
    card: original: 58992346.000000  rounded: 79668  computed: 79667.56  non adjusted: 79667.56
  access path: tablescan
    cost:  743146.94  resp: 743146.94  degree: 0
      cost_io: 740517.00  cost_cpu: 77421299057
      resp_io: 740517.00  resp_cpu: 77421299057
  access path: index (alleqrange)
    index: inx_extxxx3
    resc_io: 748296.00  resc_cpu: 8323150688
    ix_sel: 0.043441  ix_sel_with_filters: 0.043441 
    cost: 748578.73  resp: 748578.73  degree: 1
  access path: index (rangescan)
    index: inx_queryxixx
    resc_io: 1626415.00  resc_cpu: 13774888680
    ix_sel: 0.033700  ix_sel_with_filters: 0.033700 
    cost: 1626882.92  resp: 1626882.92  degree: 1
******** cost index join ********
index join: joining index inx_extxxx3
index join: joining index inx_queryxixx
ix ha join
  outer table:  tab_qwertyuiopas  alias: o
    resc: 252234.49  card 58992346.00  bytes: 17  deg: 1  resp: 252234.49
  inner table:    alias: 
    resc: 483986.54  card: 58992346.00  bytes: 30  deg: 1  resp: 483986.54
    using dmeth: 2  #groups: 1
    cost per ptn: 198690.65  #ptns: 1
    hash_area: 256 (max=262144) buildfrag: 208836  probefrag: 302451  ppasses: 1
  hash join: resc: 934911.67  resp: 934911.67  [multimatchcost=0.00]
******** index join cost ********
cost: 934911.67  
******** end index join costing ********
  best:: accesspath: tablescan
         cost: 743146.94  degree: 1  resp: 743146.94  card: 79667.56  bytes: 0
***************************************

note:
注意表有近6000万记录,谓词条件列exxxx3有994 distinct值,可见重复很高,列上有等高直方图,从uncompbkts: 254 endptvals: 38 判断压缩后相差这么大判断,在254个桶中有些桶endpoint_value重复值较多,存在严重的倾斜情况。cbo计算全表扫的cost小于使用该索引,因为10053缺失这类信息,从dba_histograms可以查看列的柱状图分布。

柱状图分布height balanced histogram

查询视图dba_histograms可以得到列的柱状图,在数据库不同的版本中视图列数可能不同。主要关注这几列


note:
可见传入的变量值’001358‘刚好是柱状图的endpoint value, 但对应前面有10个桶,属于popular value, 高频值,这就导致cbo产生了比较高的cost值。

为什么执行计划变了?

前面我们有提到这个环境库启用了peek bind并禁用acs自适应游标共享,可能是因为某些原因因为该sql执行频率并不高,sql被age out出shared pool, 在sql再次执行重新reload回shared pool里需要peek变量值,然后根据变量第一次的值生成执行计划,后面再执行相同sql时,发现shared pool中存在相同文本直接使用第一次的执行计划,因为当前的acs已禁用,当然如果acs开着又可能会引入一些parse相关的问题如latch, version high.., 所以使用变量的sql第一次传入的值就很关键,有一定小运气在里面,如果第一次传入的是低频值,那就用索引。 这个现象在大师jonathan lewis书中记录柱状图章节同样有提到。

jonathan lewis ‘s  《cost-based oracle fundamentals 》

bind variable peeking
of course, things got messier when 9i introduced bind variable peeking. whenever a statement is
optimized, oracle will (in almost all cases) check the actual values of any bind variables and
optimize the statement for those specific values.

parsing and optimizing
when an sql statement is first executed, it has to be checked for syntax, interpreted, and optimized. thereafter
if the same piece of text is fired at the database again, it may be recognized as previously used, in which
case the existing execution plan may be tracked down and reused.
however, even when a statement is still in memory, some of the information about the execution plan
may become invalid, or may get pushed out of memory by the standard memory management lru routines.
when this happens, the statement will have to be reoptimized. (you can detect this from the loads and
invalidations columns in v$sql, summarized in the reloads and invalidations columns of
v$librarycache. reloads occur when information is lost from memory; invalidations occur when some of
the dependent information changes.)
the reuse (or sharing) of sql is generally a good thing—but if the execution plan generated on the first
use of a statement caters to an unlucky set of values, then every subsequent execution of that statement will
follow the same unlucky execution plan until you can force the statement out of memory—perhaps by the
extreme method of flushing the shared pool.
bind variable peeking has the unfortunate side effect of making it very easy for one user to introduce an
execution plan that is bad news for every other user for an arbitrary amount of time, until that execution plan
happens to get flushed from the shared pool for some reason.

我们看一下sql是否有age out出shared pool。

sql first load

performance metrics of child cursors of 4ruhsafdgfj0x while still in memory.

#inst
id
child
num
plan hvexecsfetchloadsinvalparse
calls
buffer
gets
optimizer
mode
costopt env hvparsing
schema
name
modulefirst loadlast loadlast active
2101429561684262621269138all_rows1058642354766usrjdbc thin client2023-05-06/10:38:422023-05-06/10:40:122023-05-06/10:40:12
1201429561684124124123124105242all_rows1058642354766usrjdbc thin client2023-05-06/01:02:242023-05-06/10:39:552023-05-06/10:40:11
plan hvavg
elapsed
time
(secs)
avg
cpu
time
(secs)
avg
io
time
(secs)
total
execs
min
cost
max
cost
first snapshotlast snapshot
114295616842.1290.30211074111232023-04-10/01:00:162023-05-06/01:30:12
21634677993774.05967.0641447406432023-05-06/01:30:122023-05-06/10:30:15

awr

sga breakdown differencez

  • ordered by pool, name
  • n/a value for begin mb or end mb indicates the size of that pool/name was insignificant, or zero in that snapshot
poolnamebegin mbend mb% diff
javafree memory896.00896.000.00
largectwr dba buffer4.794.790.00
largepx msg pool15.6315.630.00
largefree memory99.2999.290.00
largekrcc extent chunk7.927.920.00
sharedfileopenblock102.40102.400.00
sharedkglh01,316.131,308.75-0.56
sharedkglhd200.19198.73-0.73
sharedsqla1,494.701,505.730.74
shareddb_block_hash_buckets356.00356.000.00
sharedfree memory1,660.421,662.170.11
sharedgcs resources1,090.611,090.610.00
sharedgcs shadows755.04755.040.00
sharedkglsim object batch127.54127.540.00
streamsfree memory895.94895.940.00

buffer_cache37,888.0037,888.000.00

fixed_sga2.162.160.00

log_buffer39.5439.540.00

 

library cache activity

  • “pct misses” should be very low
namespaceget requestspct misspin requestspct missreloadsinvali- dations
account_status1,3920.000
00
body1,2900.001,7440.0000
cluster205.00205.0000
dblink1,4370.000
00
edition4880.009460.0000
index1,1130.001,1130.0000
queue90.003990.0000
schema5390.000
00
sql area23,17950.64166,76121.17530
sql area build11,80999.700
00
sql area stats11,80499.6911,80499.6900
table/procedure26,1430.2083,2450.27560

note:
基本上可以判断当时有新的sql执行,把执行频率少的sql age out出内存,而刚好第一次传入的变量是一个高频值,选择了全表扫,后续所有的sql继续沿用这个执行计划,直到人工执行了统计信息收集,使shared pool的该sql 失效,再次解析时传入的值为低频的变量值,执行计划改为索引扫描。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【米乐app官网下载的版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

网站地图