客户的一个跑在AIX 6上的11.2.0.1.0版本的两节点RAC,在巡检时发现节点2的ASM没有启动,巡检人员尝试启动及恢复未果,当我介入时发现,导致ASM无法正常启动的根本原因是集群无法启动。
SQL> startup nomount
ORA-01078: Message 1078 not found; No message file forproduct=RDBMS, facility=ORA
ORA-29701: Message 29701 not found; No message file forproduct=RDBMS, facility=ORA
SQL> exit
-bash-4.1# ./crsctl start crs
CRS-4640:
Oracle High Availability Services is alreadyactive
CRS-4000: Command Start failed, or completed with errors.
-bash-4.1# ./crsctl stop crs
CRS-2796: The command may not proceed when Cluster ReadyServices is not running
CRS-4687: Shutdown command has completed with error(s).
CRS-4000: Command Stop failed, or completed with errors.
经过分析发现,节点2的hosts文件错误,私有IP配置的是一个不存在的IP地址,根据节点1的hosts文件及两台主机配置的IP地址调整后,问题依旧没有解决,尝试重新编译has,遇到libskgxns.a不存在错误。
-bash-4.1# ./roothas.pl
2015-01-27 16:05:35: Checking for super user privileges
2015-01-27 16:05:35: User has super user privileges
2015-01-27 16:05:35: Parsing the host name
Using configuration parameter file: ./crsconfig_params
The oracle binary is currently linked with RAC enabled.
Please execute the following steps to relink oraclebinary
and rerun the command with RAC disabled:
cd<crshome>
setenvORACLE_HOME pwd
cd rdbms/lib
make -fins_rdbms.mk rac_off ioracle
根据上面的提示做make -f ins_rdbms.mk rac_off ioracle操作时,报错。
-bash-4.1# pwd
/opt/app/11.2.0/grid
-bash-4.1# export ORACLE_HOME=/opt/app/11.2.0/grid
-bash-4.1# cd rdbms/lib/
-bash-4.1# make -f ins_rdbms.mk rac_off ioracle
rm -f/opt/app/11.2.0/grid/lib/libskgxp11.so
cp/opt/app/11.2.0/grid/lib//libskgxpg.so /opt/app/11.2.0/grid/lib/libskgxp11.so
rm -f/opt/app/11.2.0/grid/lib/libskgxn2.a
cp/opt/app/11.2.0/grid/lib//libskgxnr.a /opt/app/11.2.0/grid/lib/libskgxn2.a
rm -f/opt/app/11.2.0/grid/lib/libskgxn2.a
cp/opt/app/11.2.0/grid/lib//libskgxns.a /opt/app/11.2.0/grid/lib/libskgxn2.a
cp:/opt/app/11.2.0/grid/lib//libskgxns.a: A file or directory in the path namedoes not exist.
make: 1254-004 The error codefrom the last command is 1.
/opt/app/11.2.0/grid/lib目录下的确不存在libskgxns.a文件。
-bash-4.1$ ls -l /opt/app/11.2.0/grid/lib/libskgxns.a
ls: 0653-341 The file /opt/app/11.2.0/grid/lib/libskgxns.adoes not exist.
查找MOS发现这是个BUG(Bug9777859),详见RAC Turned off and relink with missinglibskgxns.a file (Doc ID 1290438.1),解决方法是将$GRID_HOME/rdbms/lib目录下的同名文件拷贝到$GRID_HOME/lib目录即可。
-bash-4.1$ cd /opt/app/11.2.0/grid/rdbms/lib
-bash-4.1$ cp libskgxns.a /opt/app/11.2.0/grid/lib
然后即可成功make -f ins_rdbms.mk rac_off ioracle了。
-bash-4.1# make -f ins_rdbms.mk rac_off ioracle
rm -f /opt/app/11.2.0/grid/lib/libskgxp11.so
cp/opt/app/11.2.0/grid/lib//libskgxpg.so /opt/app/11.2.0/grid/lib/libskgxp11.so
rm -f/opt/app/11.2.0/grid/lib/libskgxn2.a
cp/opt/app/11.2.0/grid/lib//libskgxnr.a /opt/app/11.2.0/grid/lib/libskgxn2.a
rm -f/opt/app/11.2.0/grid/lib/libskgxn2.a
cp/opt/app/11.2.0/grid/lib//libskgxns.a /opt/app/11.2.0/grid/lib/libskgxn2.a
/bin/ar -X64 d /opt/app/11.2.0/grid/rdbms/lib/libknlopt.akcsm.o
/bin/ar-X64 cr /opt/app/11.2.0/grid/rdbms/lib/libknlopt.a/opt/app/11.2.0/grid/rdbms/lib/ksnkcs.o
Target "rac_off" is up to date.
chmod 755/opt/app/11.2.0/grid/bin
- Linking Oracle
rm -f/opt/app/11.2.0/grid/rdbms/lib/oracle
... ...
再次编译has。
-bash-4.1# ./roothas.pl
2015-01-27 16:42:23: Checking for super user privileges
2015-01-27 16:42:23: User has super user privileges
2015-01-27 16:42:23: Parsing the host name
Using configuration parameter file: ./crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'dba'..
Operation successful.
CRS-4664: Node yunsuan2 successfully pinned.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has beenstarted.
ohasd is starting
yunsuan2 2015/01/27 16:43:03 /opt/app/11.2.0/grid/cdata/yunsuan2/backup_20150127_164303.olr
Successfully configured Oracle Grid Infrastructure for aStandalone Server
has成功重新编译,集群已经可以启动了。
-bash-4.1# ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora....on.type OFFLINE OFFLINE