Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

How to Diagnose Oracle RAC Problems

The vast majority of RAC issues I've encountered have been caused by one or more of the following:

  • Incorrect network configuration. Remember, the public IP addresses, VIPs and SCAN IPs must all be on the same public network. The private IPs must be on a different network to the public network. The public IPs and the private IPs must all be pingable prior to the installation.
  • Incorrect shared disk configuration. The voting disk and OCR location, as well as all the database files, need to be on shared storage for RAC to function properly. Any problems with the shared disk configuration will cause RAC to fail.
  • Missing prerequisites. There are a lot of prerequisites that must be completed before you can start a RAC installation. It may be tempting to miss steps out, but this will invariably cause problems. Make sure all prerequisites are met before starting the installation.
  • Insufficient available resources. This is especially true of people doing virtual RAC installations. The minimum requirements of 11gR2 RAC are quite significant. Without some clever tricks to free up memory, you are going to need at least 4G RAM per node to complete a fairly basic installation. Trying to install RAC on under-specced hardware can lead to some rather unpredictable results.
Assuming none of the above are the cause of your problem, you may find the utilities in the article useful for diagnosing your RAC issues.

csrctl

Amongst other things, the crsctl command allows you to check the health of the cluster. The following command displays the top-level view of the cluster.
# cd /u01/app/11.2.0/grid/bin

# ./crsctl check cluster -all
**************************************************************
ol6-112-rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
ol6-112-rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
#
The following command gives information about the individual resources.
# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       ol6-112-rac1                                 
               ONLINE  ONLINE       ol6-112-rac2                                 
ora.LISTENER.lsnr
               ONLINE  ONLINE       ol6-112-rac1                                 
               ONLINE  ONLINE       ol6-112-rac2                                 
ora.asm
               ONLINE  ONLINE       ol6-112-rac1             Started             
               ONLINE  ONLINE       ol6-112-rac2             Started             
ora.gsd
               OFFLINE OFFLINE      ol6-112-rac1                                 
               OFFLINE OFFLINE      ol6-112-rac2                                 
ora.net1.network
               ONLINE  ONLINE       ol6-112-rac1                                 
               ONLINE  ONLINE       ol6-112-rac2                                 
ora.ons
               ONLINE  ONLINE       ol6-112-rac1                                 
               ONLINE  ONLINE       ol6-112-rac2                                 
ora.registry.acfs
               ONLINE  ONLINE       ol6-112-rac1                                 
               ONLINE  ONLINE       ol6-112-rac2                                 
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       ol6-112-rac1                                 
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.cvu
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.oc4j
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.ol6-112-rac1.vip
      1        ONLINE  ONLINE       ol6-112-rac1                                 
ora.ol6-112-rac2.vip
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.rac.db
      1        ONLINE  ONLINE       ol6-112-rac1             Open                
      2        ONLINE  ONLINE       ol6-112-rac2             Open                
ora.scan1.vip
      1        ONLINE  ONLINE       ol6-112-rac1                                 
ora.scan2.vip
      1        ONLINE  ONLINE       ol6-112-rac2                                 
ora.scan3.vip
      1        ONLINE  ONLINE       ol6-112-rac2                                 
#

olsnodes

Run the olsnodes command on all cluster nodes and see that it returns a list of all the nodes in each case.
# cd /u01/app/11.2.0/grid/bin

# ./olsnodes
ol6-112-rac1
ol6-112-rac2
#

cluvfy

You have probably run the runcluvfy.sh utility from the installation media before the installing the clusterware software. Once the Oracle software is installed, the cluvfy utility is available to provide useful post-installation information. Use the "-help" flag for usage information.
$ cluvfy stage -help

USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options>  [-verbose]

SYNTAX (for Stages):
cluvfy stage -pre cfs -n <node_list> -s <storageID_list> [-verbose]
cluvfy stage -pre 
                   crsinst -file <config_file> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
                   crsinst -upgrade [-n <node_list>] [-rolling] -src_crshome <src_crshome> -dest_crshome <dest_crshome>
                           -dest_version <dest_version> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
                   crsinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}]
                           [-c <ocr_location_list>] [-q <voting_disk_list>]
                           [-osdba <osdba_group>] [-orainv <orainventory_group>]
                           [-asm [-asmgrp <asmadmin_group>] [-asmdev <asm_device_list>]] [-crshome <crs_home>]
                           [-fixup [-fixupdir <fixup_dir>]] [-networks <network_list>]
                           [-verbose]
cluvfy stage -pre acfscfg -n <node_list> [-asmdev <asm_device_list>] [-verbose]
cluvfy stage -pre 
                   dbinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}] [-osdba <osdba_group>] [-d <oracle_home>]
                          [-fixup [-fixupdir <fixup_dir>]] [-verbose]
                   dbinst -upgrade -src_dbhome <src_dbhome> [-dbname <dbname-list>] -dest_dbhome <dest_dbhome> -dest_version <dest_version>
                          [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre dbcfg -n <node_list> -d <oracle_home> [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre hacfg [-osdba <osdba_group>] [-orainv <orainventory_group>] [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -pre nodeadd -n <node_list> [-vip <vip_list>] [-fixup [-fixupdir <fixup_dir>]] [-verbose]
cluvfy stage -post hwos -n <node_list> [-s <storageID_list>] [-verbose]
cluvfy stage -post cfs -n <node_list> -f <file_system> [-verbose]
cluvfy stage -post crsinst -n <node_list> [-verbose]
cluvfy stage -post acfscfg -n <node_list> [-verbose]
cluvfy stage -post hacfg [-verbose]
cluvfy stage -post nodeadd -n <node_list> [-verbose]
cluvfy stage -post nodedel -n <node_list> [-verbose]

$
Two examples are shown below.
$ cluvfy stage -post crsinst -n ol6-112-rac1,ol6-112-rac2
$ cluvfy stage -pre dbcfg -n ol6-112-rac1,ol6-112-rac2 -d /u01/app/oracle/product/11.2.0/db_1
In all cases, check through the output and correct any errors produced.

ORAchk

Oracle provide the ORAchk tool to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-12.1, making it a useful starting point for most analysis. Information about ORAchk is available from this MOS note.
Download the zip file and install it by simply unzipping.
$ mkdir orachk
$ unzip -d orachk orachk.zip
$ cd orachk
$ ./orachk

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/12.1.0.1/grid?[y/n][y]

Checking ssh user equivalency settings on all nodes in cluster

Node ol6-121-rac2 is configured for ssh user equivalency for oracle user
 

Searching for running databases . . . . .

. . 
List of running databases registered in OCR
1. cdbrac
2. None of above

Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].1
. . 


Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
-------------------------------------------------------------------------------------------------------
                                                 Oracle Stack Status                            
-------------------------------------------------------------------------------------------------------
Host Name  CRS Installed  ASM HOME       RDBMS Installed  CRS UP    ASM UP    RDBMS UP  DB Instance Name
-------------------------------------------------------------------------------------------------------
ol6-121-rac1 Yes             N/A             Yes             Yes        Yes      Yes      cdbrac1   
ol6-121-rac2 Yes             N/A             Yes             Yes        Yes      Yes      cdbrac2   
-------------------------------------------------------------------------------------------------------
.
.
[output truncated for this article]

RACcheck

Oracle provide the RACcheck tool (MOS [ID 1268927.1]) to audit the configuration of RAC, CRS, ASM, GI etc. It supports database versions from 10.2-11.2, making it a useful starting point for most analysis. The MOS note includes the download and setup details. If you are using 11.2.0.4 or later you will have RACcheck by default.

$ unzip raccheck.zip
$ cd rachcheck
$ chmod 755 raccheck
$ ./raccheck -a

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/11.2.0/grid?[y/n][y]

Checking ssh user equivalency settings on all nodes in cluster

Node ol6-112-rac2 is configured for ssh user equivalency for oracle user
 

Searching for running databases . . . . .

. 
List of running databases registered in OCR
1. RAC
2. None

Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].
. . 


Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
-------------------------------------------------------------------------------------------------------
                                                 Oracle Stack Status                            
-------------------------------------------------------------------------------------------------------
Host Name  CRS Installed  ASM HOME       RDBMS Installed  CRS UP    ASM UP    RDBMS UP  DB Instance Name
-------------------------------------------------------------------------------------------------------
ol6-112-rac1 Yes             Yes             Yes             Yes        Yes      Yes      RAC1      
ol6-112-rac2 Yes             Yes             Yes             Yes        Yes      Yes      RAC2      
-------------------------------------------------------------------------------------------------------
.
.
[output truncated for this article]

Comments

Popular posts from this blog

PostgreSQL Pgbadger Installation On Linux

PostgreSQL Sequence

Postgresql maximum size

How to configure Replication Manager (repmgr) ?

PostgreSQL pgBadger