Oracle Clusterware Main Log
Files:
- Cluster Ready Service (CRS) logs are in <Grid_Home>/log/<hostname>/crsd/. The crsd.log file is archived every 10 MB (crsd.l01, crsd.l02,...)
- Cluster Synchronization Service (CSS) logs are in <Grid_Home>/log/<hostname>/cssd/. The cssd.log file is archived every 20 MB (cssd.l01, cssd.l02,...)
- Event Manager (EVM) logs are in <Grid_Home>/log/<hostname>/evmd.
- SRVM (srvctl) and OCR (ocrdump, ocrconfig, ocrcheck) logs are in <Grid_Home>/log/<hostname>/client/ and $ORACLE_HOME/log/<hostname>/client/.
- Important Oracle Clusterware alerts can be found in alert<nodename>.log in the <Grid_Home>/log/<hostname> directory.
- Oracle Cluster Registry tools (ocrdump, ocrcheck, ocrconfig) logs can be found in <Grid_Home>/log/<hostname>/client.
- In addition, important Automatic Storage Management (ASM)related trace and alert information can be found in the <Grid_Base>/diag/asm/+asm/+ASMn directory, specifically the log and trace directories.
Diagnostics Collection Script:
- Use the diagcollection.pl
script to collect diagnostic information from an Oracle Grid Infrastructure
installation. The diagnostics provide additional information so that Oracle
Support can resolve problems. This script is located in
<Grid_Home>/bin
/u01/app/11.2.0/grid/bin/diagcollection.pl
--collect
#
/u01/app/11.2.0/grid/bin/diagcollection.pl --collect
Production Copyright 2004, 2008, Oracle. All rights reserved
Cluster Ready Services (CRS) diagnostic collection tool
The following diagnostic archives will be created in the local directory.
crsData_host01_20090729_1013.tar.gz -> logs,traces and cores from CRS home. Note: core files will be packaged only with the --core option.
ocrData_host01_20090729_1013.tar.gz -> ocrdump, ocrcheck etc
coreData_host01_20090729_1013.tar.gz -> contents of CRS core files
osData_host01_20090729_1013.tar.gz -> logs from Operating System
Production Copyright 2004, 2008, Oracle. All rights reserved
Cluster Ready Services (CRS) diagnostic collection tool
The following diagnostic archives will be created in the local directory.
crsData_host01_20090729_1013.tar.gz -> logs,traces and cores from CRS home. Note: core files will be packaged only with the --core option.
ocrData_host01_20090729_1013.tar.gz -> ocrdump, ocrcheck etc
coreData_host01_20090729_1013.tar.gz -> contents of CRS core files
osData_host01_20090729_1013.tar.gz -> logs from Operating System
....
$ crsctl lsmodules css
The following are the Cluster Synchronization Services modules:CSSD COMCRS COMMNS CLSF SKGFD
To enable tracing for cluvfy, netca, and srvctl, set SRVM_TRACE to TRUE:
$ export SRVM_TRACE=TRUE
$ srvctl config database -d orcl > /tmp/srvctl.trc
$ cat /tmp/srvctl.trc
...
[main] [ 2009-09-16 00:58:53.197 EDT ] [CRSNativeResult.addRIAttr:139] addRIAttr: name 'ora.orcl.db 3 1', 'USR_ORA_INST_NAME@SERVERNAME(host01)':'orcl1'
[main] [ 2009-09-16 00:58:53.197 EDT ] [CRSNativeResult.addRIAttr:139] addRIAttr: name 'ora.orcl.db 3 1', 'USR_ORA_INST_NAME@SERVERNAME(host02)':'orcl2'
[main] [ 2009-09-16 00:58:53.198 EDT ] [CRSNativeResult.addRIAttr:139] addRIAttr: name 'ora.orcl.db 3 1', 'USR_ORA_INST_NAME@SERVERNAME(host03)':'orcl3'
[main] [ 2009-09-16 00:58:53.198 EDT ] [CRSNative.searchEntities:857] found 3 ntitie
...
Cluster Verify Components:
CVU supports the notion of component verification. The verifications in this category are not associated with any specific stage. A component can range from basic, such as free disk space, to complex (spanning over multiple subcomponents), such as the Oracle Clusterware stack. Availability, integrity, or any other specific behavior of a cluster component can be verified.
You can list verifiable CVU components with the cluvfy comp -list command:
$ cluvfy comp -list command
nodereach - Checks node reachability
peer - Compares properties with peers
nodecon - Checks node connectivity
ha - Checks HA integrity
cfs -Checks CFS integrity
asm - Checks ASM integrity
ssa - Checks shared storage
acfs - Checks ACFS integrity
space - Checks space availability
olr - Checks OLR integrity
sys - Checks minimum requirements
gpnp - Checks GPnP integrity
clu - Checks cluster integrity
gns - Checks GNS integrity
clumgr - Checks cluster manager integrity
scan - Checks SCAN configuration
ocr - Checks OCR integrity
ohasd - Checks OHASD integrity
admprv - Checks administrative privileges
crs - Checks CRS integrity
software - Checks software distribution
vdisk - Checks Voting Disk Udev settings
clocksync - Checks clock synchronization
nodeapp - Checks node applications’ existence
Note: For manual installation, you need to install CVU on only one node. CVU deploys itself on remote nodes during executions that require access to remote nodes.
Cluster Verify Output: Example
$ cluvfy comp crs -n all -verbose
Verifying CRS integrity
Checking CRS integrity...
The Oracle clusterware is healthy on node "host03"
The Oracle clusterware is healthy on node "host02"
The Oracle clusterware is healthy on node "host01"
CRS integrity check passed
Verification of CRS integrity was successful.
Write a shell script to copy log files before they wrap:
# Script to archive log files before wrapping occurs
# Written for CSS logs. Modify for other log file types.
CSSLOGDIR=/u01/app/11.2.0/grid/log/host01/cssd
while [ 1 –ne 0 ]; do
CSSFILE=/tmp/css_`date +%m%d%y"_"%H%M`.tar
tar -cf $CSSFILE $CSSLOGDIR/*
sleep 300
done
exit
Processes That Can Reboot Nodes:
The following processes can evict nodes from the cluster or cause a node reboot:
- hangcheck-timer: Monitors for machine hangs and pauses (it is not required in 11gR2 but required for 11gR1)
- oclskd: Is used by CSS to reboot a node based on requests from other nodes in the cluster
- ocssd: Monitors the internode’s health status
Determining Which Process Caused Reboot:
Most of the time, the process writes error messages to its log file when a reboot is required.
- ocssd
-
/var/log/messages
-
<Grid_Home>/log/<hostname>/cssd/ocssd.log
-
oclskd
-
<Grid_Home>/log/<hostname>/client/oclskd.log
- hangcheck-timer
-
/var/log/messages
Using diagwait for Eviction Troubleshooting:
When a node is evicted on a busy system, the OS may not have had time to flush logs and trace files before reboot.
- Use the diagwait CSS attribute to allow more time.
- It does not guarantee that logs will be written.
- The recommended value is 13 seconds.
- Clusterwide outage must be changed.
- It is not enabled by default.
- To enable:
# crsctl set css diagwait 13 -force
- To Disable:
# crsctl unset css diagwait
Using ocrdump to View Logical Contents of the OCR:
- The ocrdump utility can be used to view the OCR content for troubleshooting. The ocrdump utility enables you to view logical information by writing the contents to a file or displaying the contents to stdout in a readable format.
- To dump the OCR contents into a text file for reading:
[grid]$ ocrdump filename_with_limited_results.txt
[root]# ocrdump filename_with_full_results.txt
- To dump the OCR contents for a specific key:
# ocrdump -keyname SYSTEM.language
- To dump the OCR contents to stdout in XML format:
# ocrdump -stdout -xml
- To dump the contents of an OCR backup file:
# ocrdump -backupfile week.ocr
- To determine all the changes that have occurred in the OCR over the previous week, locate the automatic backup from the previous week and compare it to a dump of the current OCR as follows:
- If the ocrdump command is issued without any options, the default file name of OCRDUMPFILE will be written to the current directory, provided that the directory is writable.
# ocrdump
# ocrdump -stdout -backupfile week.ocr | diff - OCRDUMPFILE
Checking the Integrity of the OCR:
- Use the ocrcheck command to check OCR integrity.
$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 275980
Used space (kbytes) : 2824
Available space (kbytes) : 273156
ID : 1274772838
Device/File Name : +DATA1
Device/File integrity check succeeded
Device/File Name : +DATA2
Device/File integrity check succeeded
Cluster registry integrity check succeeded
Logical corruption check succeeded
OCR-Related Tools for Debugging:
OCR tools:
- ocrdump
- ocrconfig
- ocrcheck
- srvctl
Logs are generated in the following directory:
<Grid_Home>/log/<hostname>/client/
Debugging is controlled through the following file:
<Grid_Home>/srvm/admin/ocrlog.ini
- These utilities create log files in <Grid_Home>/log/<hostname>/client/. To change the amount of logging, edit the <Grid_Home>/srvm/admin/ocrlog.ini file.
- The default logging level is 0, which basically means minimum logging. When mesg_logging_level is set to 0, which is its default value, only error conditions are logged. You can change this setting to 3 or 5 for detailed logging information.
No comments:
Post a Comment