一、客户环境描述
架构:2节点rac 版本: rehat 7.9 oracle rac 19.19
二、客户问题反馈
客户反馈,节点2 undo表空间使用率过高,且使用率呈上升趋势,但应用使用目前反馈正常。
三、处理过程
1. 查看表空间使用率
结论:undotbs1 使用率正常,undotbs2 使用率过高。
临时解决方法:
考虑到使用率已经超过90%,且剩余空间不是很大,先通过增加数据文件方式快速降低使用率,避免undo空间不足,导致更严重的问题。
alter tablespace undotbs2 add datafile ' data' size 30g;
2. 查看undo 使用具体情况
select a.tablespace_name,
round(c.active_undo,2) "active_undo(mb)",
round(a.unexpired_undo,2) "unexpired_undo(mb)",
b.total_undo "total_undo(mb)",
trunc(active_undo / total_undo * 100, 2) || '%' active_undo_pct,
trunc(unexpired_undo / total_undo * 100, 2) || '%' unexpired_undo_pct
from (select nvl(sum(bytes / 1024 / 1024), 0) unexpired_undo, tablespace_name
from dba_undo_extents
where status = 'active' or status='unexpired'
group by tablespace_name) a,
(select tablespace_name, sum(bytes / 1024 / 1024) total_undo
from dba_data_files
group by tablespace_name) b,
(select nvl(sum(bytes / 1024 / 1024), 0) active_undo, tablespace_name
from dba_undo_extents
where status = 'active'
group by tablespace_name) c
where a.tablespace_name = b.tablespace_name
and a.tablespace_name=c.tablespace_name( )
order by tablespace_name ;
结论:undotbs1 、undotbs2 active 部分都不高,说明undotbs2 当前使用率高,不是active undo部分造成的,但是可以看到undotbs2 unexpired 部分比undotbs1明显多很多,说明导致undotbs2使用率高的原因是undotbs2 存在较多未过期部分,undo不释放。
3. 查看v$undostat 视图
select a.inst_id,a.begin_time,a.end_time,a.activeblks,a.expiredblks,a.unexpiredblks,a.tuned_undoretention
from gv$undostat a where inst_id=1;
select a.inst_id,a.begin_time,a.end_time,a.activeblks,a.expiredblks,a.unexpiredblks,a.tuned_undoretention
from gv$undostat a where inst_id=2;
结论:undotbs1 、undotbs2 tuned_undoretention 都是10800,说明导致undotbs2 不释放并不是因为tuned_undoretention 变大导致的。但是可以看到,undotbs2 中的activeblks、expiredblks、unexpiredblks 部分都比undotbs1 大很多,说明undotbs2 真实使用的undo 确实就是比undotbs1 多,说明undotbs2中可能存在消耗undo的sql语句,但是节点1不存在。
4. 分析awr报告
– 节点1
–节点2
sql> select table_name,index_name from dba_indexes where index_name='i_spuv_flag_update';
table_name index_name
---------------------------------------- ---------
flex_smt_param_uploadvalue i_spuv_flag_update
结论:通过segments by db blocks changes 对比可以看到,变化量最多的块都是flex_smt_param_uploadvalue表,但是节点2变化量明显比节点1 大很多,且节点2 还存在flex_fox_json 表变化量大。
5. 查看归档日志切换频率
select thread#,substr(to_char(first_time, 'mm/dd/rr hh:mi:ss'),1,5) day,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'00',1,0)) h00,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'01',1,0)) h01,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'02',1,0)) h02,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'03',1,0)) h03,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'04',1,0)) h04,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'05',1,0)) h05,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'06',1,0)) h06,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'07',1,0)) h07,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'08',1,0)) h08,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'09',1,0)) h09,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'10',1,0)) h10,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'11',1,0)) h11,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'12',1,0)) h12,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'13',1,0)) h13,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'14',1,0)) h14,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'15',1,0)) h15,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'16',1,0)) h16,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'17',1,0)) h17,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'18',1,0)) h18,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'19',1,0)) h19,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'20',1,0)) h20,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'21',1,0)) h21,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'22',1,0)) h22 ,
sum(decode(substr(to_char(first_time, 'mm/dd/rr hh24:mi:ss'),10,2),'23',1,0)) h23,
count(*) total
from v$log_history a
where first_time>=to_char(sysdate-7)
group by thread#,substr(to_char(first_time, 'mm/dd/rr hh:mi:ss'),1,5)
order by thread#,substr(to_char(first_time, 'mm/dd/rr hh:mi:ss'),1,5) desc;
分析:可以看到节点1 日志切换频率相差变大,节点2从10/16号8点开始,切换频率比平时多出很多,初步判断节点2 undo使用率比节点1高和此相关。
6. 查找相关语句
delete from flex_smt_param_uploadvalue where update_time < sysdate - 30 and rownum<1000000;
delete from flex_fox_json where update_time < sysdate - 30 and rownum <1000000;
结论:通过awr报告,找到以上相关的2个delete语句,且是job执行的,根据和应用沟通这2个job 是刚启用的。
综上:导致节点2 undotbs2 使用率过高的原因就是2个delete 语句造成的,和应用沟通,先暂停该job。