Restore Loss of All Vote Disks
Contents:
_________________________________________________________________________________________________________________
0. Environment
1. Current Status of OCR/VOTE DISK
2. Backup OCR
3. Simulate VOTE DISK corruption
4. Reboot both nodes in order to see corruption << This step is not mandatory
5. Restore loss of all Voting disk
A. Stop CRS on all the nodes
B. Start CRS in exclusive mode only
C. Create New Diskgroup
D. Restore/Move/Replace Votedisk
E. Stop CRS on Node 1
F. Start CRS on both nodes
6. Check Cluster Status
_________________________________________________________________________________________________________________
– Two Node RAC 11.2.0.3
– OS : RHEL5
1. Current Status of OCR/VOTE DISK.
[oracle@rac1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 4156 Available space (kbytes) : 257964 ID : 1037097601 Device/File Name : +DATA <<< OCR located in ASM diskgroup DATA. Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user [oracle@rac1 ~]$ [oracle@rac1 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 7a14418b50a54f9dbfda2a6b97b4f620 (/dev/oracleasm/disks/DISK5) [VOTE] <<< voting disk /dev/oracleasm/disks/DISK5 Located 1 voting disk(s). <<<< [oracle@rac1 ~]$ Note: Now both OCR and Voting disks are in two different diskgroups. OCR in DATA diskgroup Voting disk /dev/oracleasm/disks/DISK5 in VOTE diskgroup.
[root@rac1 ~]# ocrconfig -manualbackup rac2 2015/06/24 03:08:27 /u01/app/11.2.0/grid/cdata/rac-scan/backup_20150624_030827.ocr rac1 2015/06/23 05:46:12 /u01/app/11.2.0/grid/cdata/rac-scan/backup_20150623_054612.ocr rac1 2015/06/23 02:39:07 /u01/app/11.2.0/grid/cdata/rac-scan/backup_20150623_023907.ocr rac1 2015/06/19 23:38:03 /u01/app/11.2.0/grid/cdata/rac-scan/backup_20150619_233803.ocr [root@rac1 ~]# Note: With OCR backup we can recover Voting Disk in case of vote disk lose.
3. Simulate VOTE DISK corruption
DISCLAIMER: The dd command given below is just for learning purposes and should only be used on testing systems. I will not take any responsibility of any consequences or loss of data caused by this command.
Corrupt the voting disk /dev/oracleasm/disks/DISK5 dd if=/dev/zero of=/dev/oracleasm/disks/DISK5 bs=4096 count=1000000 Why only 4096 bytes? because the ASM disk header is in the first block of the first AU, and the block size is 4096 bytes. [oracle@rac1 ~]$ kfed read /dev/oracleasm/disks/DISK5 | grep kfdhdb.blksize kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 [oracle@rac1 ~]$ [oracle@rac1 ~]$ kfed read /dev/oracleasm/disks/DISK5 <<<< KFED confirms that disk got corrupted. kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 7F999709F400 00000000 00000000 00000000 00000000 [................] Repeat 255 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] [oracle@rac1 ~]$ [root@rac1 ~]# crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 7a14418b50a54f9dbfda2a6b97b4f620 (/dev/oracleasm/disks/DISK5) [VOTE] <<< Don't know why still status showing as ONLINE Located 1 voting disk(s). [root@rac1 ~]# KFED read command failed. Voting disk got corrupted, i have waited around 1 hour but some how CLUSTER DID NOT WENT DOWN. Don't know why, but i am missing something here. Please correct me if i am wrong. Let’s bring down everything in order to see the corruption. Note: I tried to stop the CRS on both nodes at the same time, on Node 2 CRS stopped, but Node 1 restarted while shutting down CRS. However i have rebooted both the nodes.
4. Reboot both nodes in order to see corruption
After reboot cluster status on both nodes.
From RAC1 ========= [root@rac1 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE rac1 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE <<<<<< ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1 [root@rac1 ~]# From RAC2 =========== [root@rac2 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE rac2 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE STARTING <<<<< It will not start because "No voting files found" ora.cssdmonitor 1 ONLINE ONLINE rac2 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac2 ora.gpnpd 1 ONLINE ONLINE rac2 ora.mdnsd 1 ONLINE ONLINE rac2 [root@rac2 ~]# alertrac1.log ============== 2015-06-25 04:25:07.002 [cssd(6313)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log 2015-06-25 04:25:22.291 [cssd(6313)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log ocssd.log from RAC1 ==================== 2015-06-25 04:25:06.961: [ SKGFD][1093830976]OSS discovery with :/dev/oracleasm/disks*: 2015-06-25 04:25:06.961: [ SKGFD][1093830976]Handle 0x7fbfd8002e50 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK1: 2015-06-25 04:25:06.962: [ SKGFD][1093830976]Handle 0x7fbfd80ead10 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK2: 2015-06-25 04:25:06.962: [ SKGFD][1093830976]Handle 0x7fbfd80eb540 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK3: 2015-06-25 04:25:06.962: [ SKGFD][1093830976]Handle 0x7fbfd80e6240 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK4: <<<<<<<<< DISK5 is missing. 2015-06-25 04:25:06.962: [ SKGFD][1093830976]Handle 0x7fbfd80e6a70 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK6: 2015-06-25 04:25:06.963: [ SKGFD][1093830976]Handle 0x7fbfd80c7d10 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK7: .. 2015-06-25 04:25:07.001: [ CSSD][1093830976]clssnmvDiskVerify: Successful discovery of 0 disks 2015-06-25 04:25:07.002: [ CSSD][1093830976]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2015-06-25 04:25:07.002: [ CSSD][1093830976]clssnmvFindInitialConfigs: No voting files found 2015-06-25 04:25:07.002: [ CSSD][1093830976](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds alertrac2.log ============== 2015-06-25 04:25:06.999 [cssd(6539)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rac2/cssd/ocssd.log 2015-06-25 04:25:22.279 [cssd(6539)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rac2/cssd/ocssd.log ocssd.log from RAC2 ===================== 2015-06-25 04:25:06.573: [ SKGFD][1087797568]OSS discovery with :/dev/oracleasm/disks*: 2015-06-25 04:25:06.573: [ SKGFD][1087797568]Handle 0x19e8640 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK1: 2015-06-25 04:25:06.573: [ SKGFD][1087797568]Handle 0x1993310 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK2: 2015-06-25 04:25:06.574: [ SKGFD][1087797568]Handle 0x1a49550 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK3: 2015-06-25 04:25:06.574: [ SKGFD][1087797568]Handle 0x18aaa40 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK4: <<<<<<<< DISK5 is missing. 2015-06-25 04:25:06.575: [ SKGFD][1087797568]Handle 0x19f6e90 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK6: 2015-06-25 04:25:06.575: [ SKGFD][1087797568]Handle 0x196cbf0 from lib :UFS:: for disk :/dev/oracleasm/disks/DISK7: .. 2015-06-25 04:25:06.999: [ CSSD][1087797568]clssnmvDiskVerify: Successful discovery of 0 disks <<< 2015-06-25 04:25:06.999: [ CSSD][1087797568]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2015-06-25 04:25:06.999: [ CSSD][1087797568]clssnmvFindInitialConfigs: No voting files found <<< 2015-06-25 04:25:07.000: [ CSSD][1087797568](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
5. Restore loss of all Voting disk.
From RAC1 ========== [root@rac1 ~]# crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1' CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1' CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1' CRS-2673: Attempting to stop 'ora.crf' on 'rac1' CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1' CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1' CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@rac1 ~]# From RAC2 =========== [root@rac2 ~]# crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2' CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2' CRS-2673: Attempting to stop 'ora.crf' on 'rac2' CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2' CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2' CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@rac2 ~]#
B. Start CRS in exclusive mode only
From RAC1 as root user
Note: From 11.2.0.2 onwards we should include flag “nocrs” in exclusive CRS startup
[root@rac1 ~]# crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1' CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1' CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1' CRS-2672: Attempting to start 'ora.gipcd' on 'rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac1' CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1' CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1' CRS-2672: Attempting to start 'ora.ctssd' on 'rac1' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1' CRS-2674: Start of 'ora.drivers.acfs' on 'rac1' failed CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2679: Attempting to clean 'ora.asm' on 'rac1' CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rac1' CRS-2676: Start of 'ora.asm' on 'rac1' succeeded [root@rac1 ~]# [oracle@rac1 ~]$ crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac1 Started << ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 OFFLINE OFFLINE ora.crsd 1 OFFLINE OFFLINE <<<< We have started CRS exclusive mode then CSSD and ASM started, but CRSD won't start ora.cssd 1 ONLINE ONLINE rac1 <<< ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE ONLINE rac1 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE OFFLINE ora.evmd 1 OFFLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1 [oracle@rac1 ~]$ [oracle@rac1 ~]$ SQL> select NAME, STATE, VOTING_FILES from v$asm_diskgroup; NAME STATE V ------------------------------ ----------- - DATA1 MOUNTED N DATA MOUNTED N <<<<< VOTE Diskgroup is missing in this output. SQL> SQL> select NAME, PATH, STATE, VOTING_FILE from v$asm_disk where PATH='/dev/oracleasm/disks/DISK5'; no rows selected << no output SQL> [oracle@rac1 ~]$ crsctl query css votedisk Located 0 voting disk(s). <<< [oracle@rac1 ~]$
Note: You don’t have new disk right now, but want to resolve this issue, then use existing ASM diskgroup to restore Voting disk. In this case you can ignore this step “Create New Diskgroup”.
SQL> create diskgroup DATA2 external redundancy disk '/dev/oracleasm/disks/DISK6' attribute 'COMPATIBLE.ASM' = '11.2';
Diskgroup created.
SQL>
D. Restore/Move/Replace Votedisk.
Note: Voting Disk will be restore from OCR backup.
From Node 1 as GI HOME owner
[oracle@rac1 ~]$ crsctl replace votedisk +DATA2 Successful addition of voting disk 7ebe19bb115e4f51bfd96935eb1b92b7. Successfully replaced voting disk group with +DATA2. CRS-4266: Voting file(s) successfully replaced <<< [oracle@rac1 ~]$ [oracle@rac1 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 7ebe19bb115e4f51bfd96935eb1b92b7 (/dev/oracleasm/disks/DISK6) [DATA2] <<< Located 1 voting disk(s). [oracle@rac1 ~]$
From RAC1
As root user
[root@rac1 ~]# crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1' CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1' CRS-2673: Attempting to stop 'ora.asm' on 'rac1' CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1' CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1' CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'rac1' CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1' CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1' CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@rac1 ~]#
From RAC1
As root
[root@rac1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rac1 ~]# [root@rac1 ~]# From RAC2 As root [root@rac2 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rac2 ~]#
[root@rac1 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online <<< CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [root@rac1 ~]# [root@rac1 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac1 Started ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 ONLINE ONLINE rac1 ora.crsd 1 ONLINE ONLINE rac1 <<< ora.cssd 1 ONLINE ONLINE rac1 ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE ONLINE rac1 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE OFFLINE ora.evmd 1 ONLINE ONLINE rac1 ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1 [root@rac1 ~]# [root@rac2 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online <<< CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [root@rac2 ~]# [root@rac2 ~]# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac2 Started ora.cluster_interconnect.haip 1 ONLINE ONLINE rac2 ora.crf 1 ONLINE ONLINE rac2 ora.crsd 1 ONLINE ONLINE rac2 <<< ora.cssd 1 ONLINE ONLINE rac2 ora.cssdmonitor 1 ONLINE ONLINE rac2 ora.ctssd 1 ONLINE ONLINE rac2 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE ONLINE rac2 ora.gipcd 1 ONLINE ONLINE rac2 ora.gpnpd 1 ONLINE ONLINE rac2 ora.mdnsd 1 ONLINE ONLINE rac2 [root@rac2 ~]#
Caution: Your use of any information or materials on this website is entirely at your own risk. It is provided for educational purposes only. It has been tested internally, however, we do not guarantee that it will work for you. Ensure that you run it in your test environment before using.
Still page under construction !!! 🙂
Excellent document with exact logs
Thanks
Rakesh
Great docs with step by step
Thank you Vaibhav
Thanks for the awesome post.I have a doubt.
Is cssd process to up while restoring vote disks?