背景:存储节点发生存储,数据文件状态不支持,计算节点2产生trace过多导致u01使用率不断增高
基础环境:oracle x7一体机 数据库版本:12.2.0.1.180717
以下是分析和处理过程:
1、数据库日志
…
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833384 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl18_107783.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833398 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl11_267126.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833460 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl07_6783.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833478 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl17_163546.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833498 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl32_372326.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833511 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl00_372255.trc:
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ‘ datac1/skedcdb/datafile/undotbs2.982.1066835405’
2023-02-15t16:00:58.833515 08:00
errors in file /u01/app/oracle/diag/rdbms/skedcdb/skedcdb2/trace/skedcdb2_cl24_54355.trc:
ora-00376: file 7 cannot be read at this time
…
数据库日志忠报错为ora-00376,ora-0110,从日志看读取undo表空间的’ datac1/skedcdb/datafile/undotbs2.982.1066835405’数据文件没有成功,产生多个trace。
2、trace日志
*** 2023-02-15t15:25:46.303098 08:00 (cdb$root(1))
kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0
----------------------------------------
so: 0xbcd85410, type: 78, owner: 0x79849860, flag: init/-/-/-/0x00 if: 0x3 c: 0x3
proc=0xe81373f0, name=transaction, file=ktccts.h line:451, pg=0, conuid=1
sga version=(1,0)
ora-00376: file 7 cannot be read at this time
ora-01110: data file 7: ' datac1/skedcdb/datafile/undotbs2.982.1066835405'
kssxdl success: so 0x3fb9b5a00 = session
pmon waiting for 1000 csecs
trace日志的内容和alert一致,读取数据文件失败,pmon进程有等待。
## 3、asm日志
2023-02-06t01:05:00.756765 08:00
exadata cell: o/..... is no longer accessible. i/o errors to disks on this might get suppressed
2023-02-06t01:05:00.868575 08:00
some dde async actions failed or were cancelled
note: failed resync of disk group 1 disks
datac1_cd_02_crmpceladm03 (4)
datac1_cd_00_crmpceladm03 (8)
datac1_cd_03_crmpceladm03 (10)
datac1_cd_04_crmpceladm03 (16)
datac1_cd_09_crmpceladm03 (22)
datac1_cd_10_crmpceladm03 (33)
datac1_cd_05_crmpceladm03 (39)
datac1_cd_11_crmpceladm03 (40)
ora-15032: not all alterations performed
ora-15080: synchronous i/o operation failed to write block 0 of disk 22 in disk group datac1
ora-15080: synchronous i/o operation failed to write block 0 of disk 4 in disk group datac1
ora-15080: synchronous i/o operation failed to write block 0 of disk 33 in disk group datac1
ora-15080: synchronous i/o operation failed to write block 0 of disk 39 in disk group datac1
2023-02-06t01:05:01.106577 08:00
error: /* exadata auto mgmt: online asm disk */
alter diskgroup datac1 online disk datac1_cd_04_crmpceladm03
, datac1_cd_02_crmpceladm03
, datac1_cd_05_crmpceladm03
, datac1_cd_03_crmpceladm03
, datac1_cd_00_crmpceladm03
, datac1_cd_10_crmpceladm03
, datac1_cd_11_crmpceladm03
, datac1_cd_09_crmpceladm03
nowait
warning: exadata auto management: os pid: 331356 operation id: 911602: online disk recoc1_cd_04_crmpceladm03 in diskgroup recoc1 failed
asm日志中有i/o错误,自动添加磁盘没有成功。 ora-15080 synchronous i/o operation failed with exadata error:'hard check failed' (doc id 2412871.1)
对于该错误按官方建议可以打补丁,也可以通过以下方式
workaround:-
关闭asm实例的磁盘检测
```sql
sql>alter diskgroup recoc1 set attribute 'hard_check.enabled' = 'false';
alter diskgroup recoc1 rebalance power 32;
完成rebalance后再开启磁盘检测
sql>alter diskgroup recoc1 set attribute 'hard_check.enabled' = 'true';
4、处理过程
(1)对数据库进行rman全备。
(2)检测数据文件状态
sql> select file#,name,status,enabled from v$datafile where status='recover';
file# name status enabled
------ -------------------------------- --------- ---------
3 datac1/skedcdb/datafile/sysaux.1002.1066835393 recover read write
7 datac1/skedcdb/datafile/undotbs2.982.1066835405 recover read write
10 datac1/skedcdb/pdb/datafile/sysaux.975.1066838165 recover read write
有3个数据文件状态为recover状态,其中cdb中2个sysaux和udnotbs2表空间各1个,pdb中sysaux表空间1个数据文件。
无法查询表空间使用率
sql> select a.tablespace_name,
round(a.s,2) "current_total(mb)",
round(a.s - f.s,2) "used(mb)",
f.s "free(mb)",
round((a.s-f.s)/ a.s * 100, 2) "used%",
g.autoextensible,
round(a.ms,2) "max_total(mb)",
round((a.s-f.s)/ a.ms * 100, 2) "max used%"
from (select d.tablespace_name,
sum(bytes / 1024 / 1024) s,
sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms
from dba_data_files d
group by d.tablespace_name) a,
(select f.tablespace_name, sum(f.bytes / 1024 / 1024) s
from dba_free_space f
group by f.tablespace_name) f,
(select distinct tablespace_name, autoextensible
from dba_data_files
where autoextensible = 'yes'
union
select distinct tablespace_name, autoextensible
from dba_data_files
where autoextensible = 'no'
and tablespace_name not in
(select distinct tablespace_name
from dba_data_files
where autoextensible = 'yes')) g
where a.tablespace_name = f.tablespace_name
and g.tablespace_name = f.tablespace_name order by "max used%";
from dba_data_files d
*
error at line 12:
ora-00376: file 3 cannot be read at this time
ora-01110: data file 3: ‘ datac1/skedcdb/datafile/sysaux.1002.1066835393’
(3)重建undo表空间
sql> create undo tablespace undotbs3 datafile ' datac1/skedcdb/datafile/undotbs3_01.dbf' size 5g autoextend on;
(4)修复数据文件
rman target /
recovery manager: release 12.2.0.1.0 - production on mon feb 20 22:22:50 2023
米乐app官网下载 copyright (c) 1982, 2017, oracle and/or its affiliates. all rights reserved.
connected to target database: planning (dbid=2741129465)
rman>
rman> recover datafile 3;
starting recover at 20-feb-23
using target database control file instead of recovery catalog
allocated channel: ora_disk_1
channel ora_disk_1: sid=2185 instance=skedcdb1 device type=disk
starting media recovery
archived log for thread 1 with sequence 14340 is already on disk as file datac1/skedcdb/archivelog/2023_02_20/thread_1_seq_14340.794.1129321103
...
通过rman修复3个数据文件,然后将数据文件onlin
在cdb中进行3号数据文件online
sql> alter database datafile 3 online;
sql> alter database datafile 10 online;
alter database datafile 10 online
*
error at line 1:
ora-01516: nonexistent log file, data file, or temporary file "10" in the
current container
pdb的数据文件需要在pdb中进行online
(5)检查数据文件状态
sql> select file#,name,status,enabled from v$datafile where status='recover';
file# name status enabled
------ -------------------------------- --------- ---------
3 datac1/skedcdb/datafile/sysaux.1002.1066835393 online read write
7 datac1/skedcdb/datafile/undotbs2.982.1066835405 online read write
10 datac1/skedcdb/pdb/datafile/sysaux.975.1066838165 online read write
数据文件状态正常
(6)删除undotbs2表空间
sql> drop tablespace undotbs2 including contents and datafiles;
drop tablespace undotbs2 including contents and datafiles
*
error at line 1:
ora-01548: active rollback segment '_syssmu11_759994052$' found, terminate
dropping tablespace
sql> select segment_name, status, tablespace_name
2 from dba_rollback_segs
3 where status not in ('online', 'offline');
segment_name status tablespace_name
------------------------------ ---------------- ------------------------------
_syssmu11_759994052$ needs recovery undotbs2
_syssmu12_3581412579$ needs recovery undotbs2
_syssmu13_398964664$ needs recovery undotbs2
_syssmu14_2753826883$ needs recovery undotbs2
_syssmu15_1616211766$ needs recovery undotbs2
_syssmu16_2927909258$ needs recovery undotbs2
_syssmu17_4161369910$ needs recovery undotbs2
_syssmu18_2116035987$ needs recovery undotbs2
_syssmu19_2637726138$ needs recovery undotbs2
_syssmu20_3386602257$ needs recovery undotbs2
10 rows selected.
参数文件加入以下内容
_corrupted_rollback_segments=(_syssmu11_759994052,_syssmu13_398964664,_syssmu15_1616211766,_syssmu17_4161369910,_syssmu19_2637726138)
使用pfile重新启动数据库
sql> startup pfile='/home/oracle/pfile_skedcdb_0220.ora';
ora-32006: sec_case_sensitive_logon initialization parameter has been deprecated
oracle instance started.
total system global area 2.6844e 10 bytes
fixed size 26502656 bytes
variable size 1.2176e 10 bytes
database buffers 1.4563e 10 bytes
redo buffers 77959168 bytes
database mounted.
database opened.
sql>
sql>
sql> show pdbs
con_id con_name open mode restricted
---------- ------------------------------ ---------- ----------
2 pdb$seed read only no
3 skedcdb mounted
删除undotbs2表空间
sql> drop tablespace undotbs2 including contents and datafiles;
tablespace dropped.
sql>
sql>
重启数据库
sql> shutdown immediate
database closed.
database dismounted.
oracle instance shut down.
sql>
sql>
sql>
sql>
sql>
sql>
sql> startup
ora-32004: obsolete or deprecated parameter(s) specified for rdbms instance
ora-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances
undo参数2个节点不一致导致该报错
sql>
sql>
sql> create pfile='/tmp/pfile.ora' from spfile;
create pfile='/tmp/pfile.ora' from spfile
*
error at line 1:
ora-01565: error in identifying file '?=/dbs/spfile@.ora'
ora-27037: unable to obtain file status
linux-x86_64 error: 2: no such file or directory
additional information: 7
sql> startup pfile='/home/oracle/pfileskedcdb_0220.ora';
ora-32006: sec_case_sensitive_logon initialization parameter has been deprecated
oracle instance started.
total system global area 2.6844e 10 bytes
fixed size 26502656 bytes
variable size 1.2176e 10 bytes
database buffers 1.4563e 10 bytes
redo buffers 77959168 bytes
database mounted.
database opened.
sql>
【小结】数据文件状态修复后,数据库alert日志不在产生读取数据文件失败相关的trace,u01目录使用率正常;建议生产环境数据库日志和数据文件状态等关键指标应该加强监控和巡检,发现问题及时处理。
-the end-