How to Diagnose Oracle RAC Problems
The vast majority of RAC issues I've encountered have been caused by one or more of the following:
- Incorrect network configuration. Remember, the public IP addresses, VIPs and SCAN IPs must all be on the same public network. The private IPs must be on a different network to the public network. The public IPs and the private IPs must all be pingable prior to the installation.
- Incorrect shared disk configuration. The voting disk and OCR location, as well as all the database files, need to be on shared storage for RAC to function properly. Any problems with the shared disk configuration will cause RAC to fail.
- Missing prerequisites. There are a lot of prerequisites that must be completed before you can start a RAC installation. It may be tempting to miss steps out, but this will invariably cause problems. Make sure all prerequisites are met before starting the installation.
- Insufficient available resources. This is especially true of people doing virtual RAC installations. The minimum requirements of 11gR2 RAC are quite significant. Without some clever tricks to free up memory, you are going to need at least 4G RAM per node to complete a fairly basic installation. Trying to install RAC on under-specced hardware can lead to some rather unpredictable results.
Assuming none of the above are the cause of your problem, you may find the utilities in the article useful for diagnosing your RAC issues.
csrctl
Amongst other things, the crsctl command allows you to check the health of the cluster. The following command displays the top-level view of the cluster.
# cd /u01/app/11.2.0/grid/bin # ./crsctl check cluster -all ************************************************************** ol6-112-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ol6-112-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** #
The following command gives information about the individual resources.
# ./crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE ol6-112-rac1 ONLINE ONLINE ol6-112-rac2 ora.LISTENER.lsnr ONLINE ONLINE ol6-112-rac1 ONLINE ONLINE ol6-112-rac2 ora.asm ONLINE ONLINE ol6-112-rac1 Started ONLINE ONLINE ol6-112-rac2 Started ora.gsd OFFLINE OFFLINE ol6-112-rac1 OFFLINE OFFLINE ol6-112-rac2 ora.net1.network ONLINE ONLINE ol6-112-rac1 ONLINE ONLINE ol6-112-rac2 ora.ons ONLINE ONLINE ol6-112-rac1 ONLINE ONLINE ol6-112-rac2 ora.registry.acfs ONLINE ONLINE ol6-112-rac1 ONLINE ONLINE ol6-112-rac2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE ol6-112-rac1 ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE ol6-112-rac2 ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE ol6-112-rac2 ora.cvu 1 ONLINE ONLINE ol6-112-rac2 ora.oc4j 1 ONLINE ONLINE ol6-112-rac2 ora.ol6-112-rac1.vip 1 ONLINE ONLINE ol6-112-rac1 ora.ol6-112-rac2.vip 1 ONLINE ONLINE ol6-112-rac2 ora.rac.db 1 ONLINE ONLINE ol6-112-rac1 Open 2 ONLINE ONLINE ol6-112-rac2 Open ora.scan1.vip 1 ONLINE ONLINE ol6-112-rac1 ora.scan2.vip 1 ONLINE ONLINE ol6-112-rac2 ora.scan3.vip 1 ONLINE ONLINE ol6-112-rac2 #
olsnodes
Run the
olsnodes
command on all cluster nodes and see that it returns a list of all the nodes in each case.# cd /u01/app/11.2.0/grid/bin # ./olsnodes ol6-112-rac1 ol6-112-rac2 #
cluvfy
You have probably run the
runcluvfy.sh
utility from the installation media before the installing the clusterware software. Once the Oracle software is installed, the cluvfy
utility is available to provide useful post-installation information. Use the "-help" flag for usage information.$ cluvfy stage -help USAGE: cluvfy stage {-pre|-post} <stage-name> <stage-specific options> [-verbose] SYNTAX (for Stages): cluvfy stage -pre cfs -n <node_list> -s <storageID_list> [-verbose] cluvfy stage -pre crsinst -file <config_file> [-fixup [-fixupdir <fixup_dir>]] [-verbose] crsinst -upgrade [-n <node_list>] [-rolling] -src_crshome <src_crshome> -dest_crshome <dest_crshome> -dest_version <dest_version> [-fixup [-fixupdir <fixup_dir>]] [-verbose] crsinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}] [-c <ocr_location_list>] [-q <voting_disk_list>] [-osdba <osdba_group>] [-orainv <orainventory_group>] [-asm [-asmgrp <asmadmin_group>] [-asmdev <asm_device_list>]] [-crshome <crs_home>] [-fixup [-fixupdir <fixup_dir>]] [-networks <network_list>] [-verbose] cluvfy stage -pre acfscfg -n <node_list> [-asmdev <asm_device_list>] [-verbose] cluvfy stage -pre dbinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}] [-osdba <osdba_group>] [-d <oracle_home>] [-fixup [-fixupdir <fixup_dir>]] [-verbose] dbinst -upgrade -src_dbhome <src_dbhome> [-dbname <dbname-list>] -dest_dbhome <dest_dbhome> -dest_version <dest_version> [-fixup [-fixupdir <fixup_dir>]] [-verbose] cluvfy stage -pre dbcfg -n <node_list> -d <oracle_home> [-fixup [-fixupdir <fixup_dir>]] [-verbose] cluvfy stage -pre hacfg [-osdba <osdba_group>] [-orainv <orainventory_group>] [-fixup [-fixupdir <fixup_dir>]] [-verbose] cluvfy stage -pre nodeadd -n <node_list> [-vip <vip_list>] [-fixup [-fixupdir <fixup_dir>]] [-verbose] cluvfy stage -post hwos -n <node_list> [-s <storageID_list>] [-verbose] cluvfy stage -post cfs -n <node_list> -f <file_system> [-verbose] cluvfy stage -post crsinst -n <node_list> [-verbose] cluvfy stage -post acfscfg -n <node_list> [-verbose] cluvfy stage -post hacfg [-verbose] cluvfy stage -post nodeadd -n <node_list> [-verbose] cluvfy stage -post nodedel -n <node_list> [-verbose] $
Two examples are shown below.
$ cluvfy stage -post crsinst -n ol6-112-rac1,ol6-112-rac2 $ cluvfy stage -pre dbcfg -n ol6-112-rac1,ol6-112-rac2 -d /u01/app/oracle/product/11.2.0/db_1
In all cases, check through the output and correct any errors produced.
ORAchk
Oracle provide the ORAchk tool to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-12.1, making it a useful starting point for most analysis. Information about ORAchk is available from this MOS note.
Download the zip file and install it by simply unzipping.
$ mkdir orachk $ unzip -d orachk orachk.zip $ cd orachk $ ./orachk CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/12.1.0.1/grid?[y/n][y] Checking ssh user equivalency settings on all nodes in cluster Node ol6-121-rac2 is configured for ssh user equivalency for oracle user Searching for running databases . . . . . . . List of running databases registered in OCR 1. cdbrac 2. None of above Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].1 . . Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ------------------------------------------------------------------------------------------------------- Oracle Stack Status ------------------------------------------------------------------------------------------------------- Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name ------------------------------------------------------------------------------------------------------- ol6-121-rac1 Yes N/A Yes Yes Yes Yes cdbrac1 ol6-121-rac2 Yes N/A Yes Yes Yes Yes cdbrac2 ------------------------------------------------------------------------------------------------------- . . [output truncated for this article]
RACcheck
Oracle provide the RACcheck tool (MOS [ID 1268927.1]) to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-11.2, making it a useful starting point for most analysis. The MOS note includes the download and setup details. If you are using 11.2.0.4 or later you will have RACcheck by default.
$ unzip raccheck.zip $ cd rachcheck $ chmod 755 raccheck $ ./raccheck -a CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/11.2.0/grid?[y/n][y] Checking ssh user equivalency settings on all nodes in cluster Node ol6-112-rac2 is configured for ssh user equivalency for oracle user Searching for running databases . . . . . . List of running databases registered in OCR 1. RAC 2. None Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1]. . . Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ------------------------------------------------------------------------------------------------------- Oracle Stack Status ------------------------------------------------------------------------------------------------------- Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name ------------------------------------------------------------------------------------------------------- ol6-112-rac1 Yes Yes Yes Yes Yes Yes RAC1 ol6-112-rac2 Yes Yes Yes Yes Yes Yes RAC2 ------------------------------------------------------------------------------------------------------- . . [output truncated for this article]
Comments
Post a Comment