How to Diagnose Oracle RAC Problems
The vast majority of RAC issues I've encountered have been caused by one or more of the following:
- Incorrect network configuration. Remember, the public IP addresses, VIPs and SCAN IPs must all be on the same public network. The private IPs must be on a different network to the public network. The public IPs and the private IPs must all be pingable prior to the installation.
- Incorrect shared disk configuration. The voting disk and OCR location, as well as all the database files, need to be on shared storage for RAC to function properly. Any problems with the shared disk configuration will cause RAC to fail.
- Missing prerequisites. There are a lot of prerequisites that must be completed before you can start a RAC installation. It may be tempting to miss steps out, but this will invariably cause problems. Make sure all prerequisites are met before starting the installation.
- Insufficient available resources. This is especially true of people doing virtual RAC installations. The minimum requirements of 11gR2 RAC are quite significant. Without some clever tricks to free up memory, you are going to need at least 4G RAM per node to complete a fairly basic installation. Trying to install RAC on under-specced hardware can lead to some rather unpredictable results.
Assuming none of the above are the cause of your problem, you may find the utilities in the article useful for diagnosing your RAC issues.
csrctl
Amongst other things, the crsctl command allows you to check the health of the cluster. The following command displays the top-level view of the cluster.
# cd /u01/app/11.2.0/grid/bin # ./crsctl check cluster -all ************************************************************** ol6-112-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ol6-112-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** #
The following command gives information about the individual resources.
# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE ol6-112-rac1
ONLINE ONLINE ol6-112-rac2
ora.LISTENER.lsnr
ONLINE ONLINE ol6-112-rac1
ONLINE ONLINE ol6-112-rac2
ora.asm
ONLINE ONLINE ol6-112-rac1 Started
ONLINE ONLINE ol6-112-rac2 Started
ora.gsd
OFFLINE OFFLINE ol6-112-rac1
OFFLINE OFFLINE ol6-112-rac2
ora.net1.network
ONLINE ONLINE ol6-112-rac1
ONLINE ONLINE ol6-112-rac2
ora.ons
ONLINE ONLINE ol6-112-rac1
ONLINE ONLINE ol6-112-rac2
ora.registry.acfs
ONLINE ONLINE ol6-112-rac1
ONLINE ONLINE ol6-112-rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE ol6-112-rac1
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE ol6-112-rac2
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE ol6-112-rac2
ora.cvu
1 ONLINE ONLINE ol6-112-rac2
ora.oc4j
1 ONLINE ONLINE ol6-112-rac2
ora.ol6-112-rac1.vip
1 ONLINE ONLINE ol6-112-rac1
ora.ol6-112-rac2.vip
1 ONLINE ONLINE ol6-112-rac2
ora.rac.db
1 ONLINE ONLINE ol6-112-rac1 Open
2 ONLINE ONLINE ol6-112-rac2 Open
ora.scan1.vip
1 ONLINE ONLINE ol6-112-rac1
ora.scan2.vip
1 ONLINE ONLINE ol6-112-rac2
ora.scan3.vip
1 ONLINE ONLINE ol6-112-rac2
#
olsnodes
Run the
olsnodes command on all cluster nodes and see that it returns a list of all the nodes in each case.# cd /u01/app/11.2.0/grid/bin # ./olsnodes ol6-112-rac1 ol6-112-rac2 #
cluvfy
You have probably run the
runcluvfy.sh utility from the installation media before the installing the clusterware software. Once the Oracle software is installed, the cluvfy utility is available to provide useful post-installation information. Use the "-help" flag for usage information.$ cluvfy stage -help
USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options> [-verbose]
SYNTAX (for Stages):
cluvfy stage -pre cfs -n <node_list> -s <storageID_list> [-verbose]
cluvfy stage -pre
crsinst -file <config_file> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
crsinst -upgrade [-n <node_list>] [-rolling] -src_crshome <src_crshome> -dest_crshome <dest_crshome>
-dest_version <dest_version> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
crsinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}]
[-c <ocr_location_list>] [-q <voting_disk_list>]
[-osdba <osdba_group>] [-orainv <orainventory_group>]
[-asm [-asmgrp <asmadmin_group>] [-asmdev <asm_device_list>]] [-crshome <crs_home>]
[-fixup [-fixupdir <fixup_dir>]] [-networks <network_list>]
[-verbose]
cluvfy stage -pre acfscfg -n <node_list> [-asmdev <asm_device_list>] [-verbose]
cluvfy stage -pre
dbinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}] [-osdba <osdba_group>] [-d <oracle_home>]
[-fixup [-fixupdir <fixup_dir>]] [-verbose]
dbinst -upgrade -src_dbhome <src_dbhome> [-dbname <dbname-list>] -dest_dbhome <dest_dbhome> -dest_version <dest_version>
[-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre dbcfg -n <node_list> -d <oracle_home> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre hacfg [-osdba <osdba_group>] [-orainv <orainventory_group>] [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre nodeadd -n <node_list> [-vip <vip_list>] [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -post hwos -n <node_list> [-s <storageID_list>] [-verbose]
cluvfy stage -post cfs -n <node_list> -f <file_system> [-verbose]
cluvfy stage -post crsinst -n <node_list> [-verbose]
cluvfy stage -post acfscfg -n <node_list> [-verbose]
cluvfy stage -post hacfg [-verbose]
cluvfy stage -post nodeadd -n <node_list> [-verbose]
cluvfy stage -post nodedel -n <node_list> [-verbose]
$
Two examples are shown below.
$ cluvfy stage -post crsinst -n ol6-112-rac1,ol6-112-rac2 $ cluvfy stage -pre dbcfg -n ol6-112-rac1,ol6-112-rac2 -d /u01/app/oracle/product/11.2.0/db_1
In all cases, check through the output and correct any errors produced.
ORAchk
Oracle provide the ORAchk tool to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-12.1, making it a useful starting point for most analysis. Information about ORAchk is available from this MOS note.
Download the zip file and install it by simply unzipping.
$ mkdir orachk
$ unzip -d orachk orachk.zip
$ cd orachk
$ ./orachk
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/12.1.0.1/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node ol6-121-rac2 is configured for ssh user equivalency for oracle user
Searching for running databases . . . . .
. .
List of running databases registered in OCR
1. cdbrac
2. None of above
Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].1
. .
Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-------------------------------------------------------------------------------------------------------
Oracle Stack Status
-------------------------------------------------------------------------------------------------------
Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name
-------------------------------------------------------------------------------------------------------
ol6-121-rac1 Yes N/A Yes Yes Yes Yes cdbrac1
ol6-121-rac2 Yes N/A Yes Yes Yes Yes cdbrac2
-------------------------------------------------------------------------------------------------------
.
.
[output truncated for this article]
RACcheck
Oracle provide the RACcheck tool (MOS [ID 1268927.1]) to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-11.2, making it a useful starting point for most analysis. The MOS note includes the download and setup details. If you are using 11.2.0.4 or later you will have RACcheck by default.
$ unzip raccheck.zip
$ cd rachcheck
$ chmod 755 raccheck
$ ./raccheck -a
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/11.2.0/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node ol6-112-rac2 is configured for ssh user equivalency for oracle user
Searching for running databases . . . . .
.
List of running databases registered in OCR
1. RAC
2. None
Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].
. .
Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-------------------------------------------------------------------------------------------------------
Oracle Stack Status
-------------------------------------------------------------------------------------------------------
Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name
-------------------------------------------------------------------------------------------------------
ol6-112-rac1 Yes Yes Yes Yes Yes Yes RAC1
ol6-112-rac2 Yes Yes Yes Yes Yes Yes RAC2
-------------------------------------------------------------------------------------------------------
.
.
[output truncated for this article]
Comments
Post a Comment