文章目錄
- 現象
- 分析
- 結論
- 解決程序
- 回顧
現象
安裝了我們公司的產品后客戶的資料庫突然出現客戶端無法連接,檢查錯誤日志出現了大量的ORA-27300錯誤,下面是出現第一次ORA-27300的記錄,
Thu Dec 31 15:49:14 2020
Archived Log entry 1248193 added for thread 1 sequence 1248219 ID 0x348d2cd1 dest 1:
Thu Dec 31 15:49:39 2020
Errors in file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_125990.trc:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Thu Dec 31 15:49:40 2020
檢查red hat 7的系統日志:
Dec 31 15:40:01 db1-gss systemd: Starting Session 63477 of user root.
Dec 31 15:49:20 db1-gss systemd: Reloading.
Dec 31 15:49:20 db1-gss systemd-sysv-generator[125498]: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Reloading.
Dec 31 15:49:20 db1-gss systemd-sysv-generator[125515]: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Started dbackup3 agent daemon.
Dec 31 15:49:20 db1-gss systemd: Starting dbackup3 agent daemon...
Dec 31 15:49:21 db1-gss systemd: Stopping dbackup3 agent daemon...
Dec 31 15:49:21 db1-gss systemd: Started dbackup3 agent daemon.
資料庫錯誤日志中的其他資訊:
: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Process J000 died, see its trace file
Thu Dec 31 16:36:55 2020
kkjcre1p: unable to spawn jobq slave process
Thu Dec 31 16:36:55 2020
Errors in file :
Thu Dec 31 16:36:56 2020
Errors in file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_348517.trc:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Process J000 died, see its trace file
Thu Dec 31 16:36:57 2020
trace檔案中的資訊:
Trace file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_348517.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
ORACLE_HOME = /odata/oracle/app/oracle/product/12.1.0/dbhome_1
System name: Linux
Node name: db1-gss
Release: 3.10.0-693.el7.x86_64
Version: #1 SMP Thu Jul 6 19:56:57 EDT 2017
Machine: x86_64
Instance name: WIND
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 348517, image:
*** 2020-12-31 16:36:56.427
Unexpected error 27140 in job slave process
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
分析
從現象上看,客戶的資料庫正好是我們的產品第一次安裝的時候了問題,所以客戶懷疑是我們的軟體問題,從資料庫系統的日志看資料庫上次啟動時間是Thu Mar 05 22:27:32 2020,已經9個月沒有啟動了,
從trace記錄里面分析,oracle行程9個月前啟動的時候egid(有效組id)是1000 (dba),當前是1001 (oinstall),
檢查oracle執行檔案的屬性
[oracle@db1-gss trace]$ ls -l /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/oracle
-rwsr-s--x. 1 oracle oinstall 323649840 Dec 27 2019 /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/oracl
發現組是oinstall,
再檢查正在運行的oracle行程,發現組號是1000
[root@db1-gss ~]# ps -eo pid,stat,pri,uid,gid,cmd |grep oracle
56474 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
72953 S 19 0 0 su - oracle
97310 S+ 19 0 0 grep --color=auto oracle
198434 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
198452 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
311705 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
313106 Ssl 19 1000 1000 /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
313475 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
327504 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
327637 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
331933 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
338102 Ss 19 1000 1000 oracleWIND (LOCAL=NO)
檢查oracle用戶,主組是1000(dba)
# id oracle
uid=1000(oracle) gid=1000(dba) groups=1000(dba),1001(oinstall)
[root@db1-gss ~]#
結論
檔案系統里面的oracle執行程式的組是oinstall,而oracle用戶的組是dba,因此當我們軟體安裝的時候需要產生一個從行程(spawn jobq slave process)是dba,和之前的行程組oinstall不同,造成沖突
解決程序
關閉資料庫,將oracle用戶的主組從dba改成oinstall,
[root@oracle18 orcl]# id oracle
uid=54321(oracle) gid=54322(dba) groups=54322(dba),54321(oinstall),54323(oper),54324(backupdba),54325(dgdba),54326(kmdba),54330(racdba)
[root@oracle18 orcl]# usermod -g oinstall oracle
[root@oracle18 orcl]# id oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54323(oper),54324(backupdba),54325(dgdba),54326(kmdba),54330(racdba)
[root@oracle18 orcl]#
結果關機關不了
SQL> shutdown immediate;
ERROR:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid =
1001 (oinstall)
結果關機關不了,修改用戶和oracle執行檔案的屬主成錯誤的組號后才關資料庫,
再把組號改成正確的,再啟動資料庫,成功!
完成后客戶連接不進來,原來監聽的行程也是同樣的問題,組號不對,重新啟動后解決,
回顧
又是幫客戶背鍋,他們修改了組id,自己不知道,我們的產品啟動了一次從行程就暴露了!
CSDN認證博客專家
ACE
華為云 MVP
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/243641.html
標籤:其他
上一篇:Mysql基礎知識
下一篇:websphere 類加載策略
