巡检发现某生产库 top 1等待事件为enq: cr - block range reuse ckpt,经分析定位为undo steal导致,这里主要测试重现oracle steal undo extent及enq: cr - block range reuse ckpt等待事件产生的过程。
1、创建表空间
sql> create undo tablespace undo2 datafile '/oracle/app/oracle/oradata/nocdb19c/undo2.dbf' size 20m;
tablespace created.
sql> alter system set undo_tablespace=undo2;
system altered.
sql> select retention from dba_tablespaces where tablespace_name='undo2';
retention
-----------
noguarantee
sql> @p undo_retention
name value
---------------------------------------- ----------------------------------------
undo_retention 900
2、登录数据库创建测试表,查找对应server spid,gdb设置断点
create table dbmt.tobj as select * from dba_objects;
在oracle undo steal相关函数上设置断点
(gdb) break ktusm_stealext
breakpoint 2 at 0x1c48910
(gdb) c
3、更新dbmt.tobj表,产生undo占用
72634 rows updated.
sql> @trans
sid serial# username taddr ses_addr used_ublk used_urec 0xflag status start_date xidusn xidslot xidsqn xid prv_xid ptx_xid
---------- ---------- --------------------------- ---------------- ---------- ---------- --------- ----------------------------- ------------------ ---------- ---------- ---------- ---------------- ---------------- ----------------
1 49117 sys 0000000073733698 000000007856c6f0 1026 76885 e03 active 30-nov-21 11 15 5 0b000f0005000000 0000000000000000 0000000000000000
commit;sql> @df
tablespace_name totalmb usedmb freemb % used ext used
------------------------------ ---------- ---------- ---------- ------ --- ----------------------
ilmtsdemo_lowcost 50 9 41 18% yes |#### |
ilm_source 12 1 11 9% yes |## |
sysaux 700 661 39 95% yes |################### |
system 980 971 9 100% yes |####################|
temp 111 111 0 100% yes |####################|
undo2 20 12 8 60% no |############ | <<<<
undotbs1 645 300 345 47% yes |########## |
users 43 40 3 94% yes |################### |
colsegment_namefora30select b.us#,segment_name,round(bytes/1024/1024,2)mb,b.status$ from dba_segments a,undo$ b where tablespace_name='undo2' and a.segment_name=b.name;sql> sql>
us# segment_name mb status$
---------- ------------------------------ ---------- ----------
11 _syssmu11_218934645$ 9 3 <<<<
12 _syssmu12_1212566812$ .13 3
13 _syssmu13_1508681309$ .13 3
14 _syssmu14_1297497020$ .13 3
15 _syssmu15_3396771108$ .13 3
16 _syssmu16_1123823253$ .13 3
17 _syssmu17_1247462769$ .13 3
18 _syssmu18_74029735$ .13 3
19 _syssmu19_3285312421$ .13 3
20 _syssmu20_2211160347$ .13 3
10 rows selected.
4、再次更新dbmt.tobj表,一次少更新几行,确定事务未分配到刚刚使用的undo segment上
98 rows updated.
sql>
sql>@trans
sid serial# username taddr ses_addr used_ublk used_urec 0xflag status start_date xidusn xidslot xidsqn xid prv_xid ptx_xid
---------- ---------- ------------ ---------------- ---------------- ---------- ---------- --------- ----------------------------- ------------------ ---------- ---------- ---------- ---------------- ---------------- ----------------
1 49117 sys 0000000073733698 000000007856c6f0 1 6 e03 active 30-nov-21 19 16 5 1300100005000000 0000000000000000 0000000000000000
xidusn 19
colsegment_namefora30
select b.us#,segment_name,round(bytes/1024/1024,2)mb,b.status$ from dba_segments a,undo$ b where tablespace_name='undo2' and a.segment_name=b.name;
sql>
us# segment_name mb status$
---------- ------------------------------ ---------- ----------
11 _syssmu11_218934645$ 9 3
12 _syssmu12_1212566812$ .13 3
13 _syssmu13_1508681309$ .13 3
14 _syssmu14_1297497020$ .13 3
15 _syssmu15_3396771108$ .13 3
16 _syssmu16_1123823253$ .13 3
17 _syssmu17_1247462769$ .13 3
18 _syssmu18_74029735$ .13 3
19 _syssmu19_3285312421$ .13 3 <<<<
20 _syssmu20_2211160347$ .13 3
10 rows selected.
5、gdb 继续调试,观察undo segment会产生steal,但未观察到活动会话中的等待事务
sql>updatedbmt.tobjsetobject_name=object_namewhereobject_id<100000;
sql> @df
tablespace_name totalmb usedmb freemb % used ext used
------------------------------ ---------- ---------- ---------- ------ --- ----------------------
ilmtsdemo_lowcost 50 9 41 18% yes |#### |
ilm_source 12 1 11 9% yes |## |
sysaux 700 661 39 95% yes |################### |
system 980 971 9 100% yes |####################|
temp 111 111 0 100% yes |####################|
undo2 20 20 0 100% no |####################|
undotbs1 645 300 345 47% yes |########## |
users 43 40 3 94% yes |################### |
8 rows selected.
us# segment_name mb status$
---------- ------------------------------ ---------- ----------
11 _syssmu11_218934645$ 9 3
12 _syssmu12_1212566812$ .13 3
13 _syssmu13_1508681309$ .13 3
14 _syssmu14_1297497020$ .13 3
15 _syssmu15_3396771108$ .13 3
16 _syssmu16_1123823253$ .13 3
17 _syssmu17_1247462769$ .13 3
18 _syssmu18_74029735$ .13 3
19 _syssmu19_3285312421$ 9 3
20 _syssmu20_2211160347$ .13 3
10 rows selected.
username sid event machine module status last_et sql_dt sql_id plobj wai_secinw row_obj sqltext bs osuser
----------- ------ -------------------- ---------- -------------------- -------- ------- ---------- --------------- -------- ---------- -------- ------------------------------ ---------- ----------
sys 1 on cpu / runqueue rac1 sql*plus active 128 128 7gn43xth7dhkz 3855:166 -1 update dbmt.tobj set object_n : oracle
breakpoint 2, 0x0000000001c48910 in ktusm_stealext ()
(gdb) bt
#0 0x0000000001c48910 in ktusm_stealext ()
#1 0x0000000001c461cf in ktusmasp1r ()
#2 0x0000000001c4fc27 in ktrsexecexecuteinexcephdlr ()
#3 0x0000000001c4ece1 in ktrsexec ()
#4 0x0000000001bf38db in ktuaspresumableinpdb ()
#5 0x0000000001c45505 in ktusmasp ()
#6 0x00000000126e5bbf in ktuchg2 ()
#7 0x00000000127177e7 in ktbchg2 ()
#8 0x00000000039d4ad1 in kdu_array_flush_retry ()
#9 0x00000000039d407c in kdu_array_flush1 ()
#10 0x0000000012b60510 in qerupupdrow ()
#11 0x0000000012b5f023 in qeruproprowsets ()
#12 0x0000000003b3897f in kdstf110110100001000km ()
#13 0x0000000012ac19f7 in kdsttgr ()
#14 0x0000000012b43054 in qertbfetch ()
#15 0x0000000012b5f378 in qerupfetch ()
#16 0x0000000012a053c8 in updaul ()
#17 0x0000000012a041a5 in updthreephaseexe ()
#18 0x0000000012a0176b in updexe ()
#19 0x00000000129267fd in opiexe ()
#20 0x0000000012991a52 in kpoal8 ()
#21 0x000000001291fe22 in opiodr ()
#22 0x0000000012c849e7 in ttcpip ()
#23 0x000000000293e9c9 in opitsk ()
#24 0x00000000029432c8 in opiino ()
#25 0x000000001291fe22 in opiodr ()
#26 0x000000000293a746 in opidrv ()
#27 0x0000000003507375 in sou2o ()
#28 0x0000000000dbe406 in opimai_real ()
#29 0x0000000003513421 in ssthrdmain ()
#30 0x0000000000dbe230 in main ()
(gdb)
6、gdb 继续调试,观察undo segment会产生steal
(gdb) n
single stepping until exit from function ktusm_stealext,
which has no line number information.
0x0000000001c461cf in ktusmasp1r ()
(gdb) n
single stepping until exit from function ktusmasp1r,
which has no line number information.
breakpoint 1, 0x0000000001e77a90 in kcbrbr_int ()
(gdb) n
single stepping until exit from function kcbrbr_int,
which has no line number information.
0x0000000001e77a6f in kcbrbr ()
(gdb) n
single stepping until exit from function kcbrbr,
which has no line number information.
0x000000000202f51d in kteopdelete1 ()
(gdb) n
single stepping until exit from function kteopdelete1,
which has no line number information.
0x000000000208e469 in ktsxr_delete ()
(gdb) n
single stepping until exit from function ktsxr_delete,
which has no line number information.
0x0000000001c146b6 in ktusp_delextent ()
(gdb) n
single stepping until exit from function ktusp_delextent,
which has no line number information.
0x0000000001c12daa in ktushrinkundoseg ()
(gdb) n
single stepping until exit from function ktushrinkundoseg,
which has no line number information.
0x0000000001c48ec4 in ktusmrsn ()
(gdb) n
single stepping until exit from function ktusmrsn,
which has no line number information.
0x0000000001c470c0 in ktusmasp1r ()
(gdb) bt
#0 0x0000000001c470c0 in ktusmasp1r ()
#1 0x0000000001c4fc27 in ktrsexecexecuteinexcephdlr ()
#2 0x0000000001c4ece1 in ktrsexec ()
#3 0x0000000001bf38db in ktuaspresumableinpdb ()
#4 0x0000000001c45505 in ktusmasp ()
#5 0x00000000126e5bbf in ktuchg2 ()
#6 0x00000000127177e7 in ktbchg2 ()
#7 0x00000000039d4ad1 in kdu_array_flush_retry ()
#8 0x00000000039d407c in kdu_array_flush1 ()
#9 0x0000000012b60510 in qerupupdrow ()
#10 0x0000000012b5f023 in qeruproprowsets ()
#11 0x0000000003b3897f in kdstf110110100001000km ()
#12 0x0000000012ac19f7 in kdsttgr ()
#13 0x0000000012b43054 in qertbfetch ()
#14 0x0000000012b5f378 in qerupfetch ()
#15 0x0000000012a053c8 in updaul ()
#16 0x0000000012a041a5 in updthreephaseexe ()
#17 0x0000000012a0176b in updexe ()
#18 0x00000000129267fd in opiexe ()
#19 0x0000000012991a52 in kpoal8 ()
#20 0x000000001291fe22 in opiodr ()
#21 0x0000000012c849e7 in ttcpip ()
#22 0x000000000293e9c9 in opitsk ()
#23 0x00000000029432c8 in opiino ()
#24 0x000000001291fe22 in opiodr ()
#25 0x000000000293a746 in opidrv ()
#26 0x0000000003507375 in sou2o ()
#27 0x0000000000dbe406 in opimai_real ()
#28 0x0000000003513421 in ssthrdmain ()
#29 0x0000000000dbe230 in main ()
us# segment_name mb status$
---------- ------------------------------ ---------- ----------
11 _syssmu11_218934645$ 8.88 3 <<<<被偷窃,从9m缩小到8.88
12 _syssmu12_1212566812$ .13 3
13 _syssmu13_1508681309$ .13 3
14 _syssmu14_1297497020$ .13 3
15 _syssmu15_3396771108$ .13 3
16 _syssmu16_1123823253$ .13 3
17 _syssmu17_1247462769$ .13 3
18 _syssmu18_74029735$ .13 3
19 _syssmu19_3285312421$ 9.13 3<<<<活动事务使用,增大
20 _syssmu20_2211160347$ .13 3
10 rows selected.
前面通过gdb加断点测试,捕捉到了ktusm_stealext函数调用,但观察不到等待enq: cr - block range reuse ckpt,猜想这个等待是毫秒级的,如果加断点的位置并不是加wait event之后,解除之前应该是看不到的,通过10026再观察。
先尝试steal,如果不成功,才能使用free extent
通过trace 观察到了enq: cr - block range reuse ckpt
nocdb19c_ora_2279.trc
行 558: wait #139934742178240: nam='enq: cr - block range reuse ckpt' ela= 1446 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126482933
行 581: wait #139934742178240: nam='enq: cr - block range reuse ckpt' ela= 597 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126486780
行 604: wait #139934742178240: nam='enq: cr - block range reuse ckpt' ela= 1652 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126491195
并且在等待enq: cr - block range reuse ckpt之前的等待事件为reliable message
wait #139934742178240: nam='reliable message' ela= 301 channel context=1989491688 channel handle=1989402640 broadcast message=1991183576 obj#=-1 tim=1126481460
wait #139934742178240: nam='enq: cr - block range reuse ckpt' ela= 1446 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126482933
等待是发生成update执行阶段,并非解析sql阶段
parsing in cursor #139934742178240 len=65 dep=0 uid=0 oct=6 lid=0 tim=1126068945 hv=149629389 ad='69cba1c0' sqlid='cucft6n4fqafd'
update dbmt.tobj2 set object_name=object_name where object_id>100
等待在以下递归sql之后,看起来就是递归undo相关操作。
parsing in cursor #139934736487328 len=85 dep=1 uid=0 oct=3 lid=0 tim=1126297347 hv=3573688918 ad='69475af8' sqlid='86708bvah4akq'
select name from undo$ where file#=:1 and block#=:2 and ts#=:3 and status$ != 1