Simulating voting disk failures

RHEL: 6.6 x86_64
DB: 11.2.0.4.160119 RAC

[grid@SCMSDBS05 8540]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 6d40ce0c73704fbabf8a2a8f73008caa (/dev/mapper/mpath03p1) [OCRDG]
2. ONLINE 1689ccb98f974f0fbf9f174484ec1b37 (/dev/mapper/mpath04p1) [OCRDG]
3. ONLINE 75ce7de14f784f5ebfc67dd08c52fd97 (/dev/mapper/mpath05p1) [OCRDG]
Located 3 voting disk(s).

[grid@SCMSDBS05 8540]$ dd if=/dev/zero of=/dev/mapper/mpath03p1
dd: writing to `/dev/mapper/mpath03p1': No space left on device
10474318+0 records in
10474317+0 records out
5362850304 bytes (5.4 GB) copied, 11.1357 s, 482 MB/s

ocssd.log output:
2016-03-21 09:12:25.686: [ CSSD][3536733952]clssnmvVoteDiskValidation: Voting disk /dev/mapper/mpath03p1 is corrupted
2016-03-21 09:12:25.686: [ CSSD][3536733952]clssnmvWorkerThread: disk /dev/mapper/mpath03p1 corrupted
2016-03-21 09:12:25.686: [ CSSD][3536733952]clssnmvDiskAvailabilityChange: voting file /dev/mapper/mpath03p1 now offline
2016-03-21 09:12:25.766: [ SKGFD][3539887872]Lib :UFS:: closing handle 0x7f62a008e1e0 for disk :/dev/mapper/mpath03p1:

2016-03-21 09:12:26.016: [ SKGFD][3538310912]Lib :UFS:: closing handle 0x7f62a403c3d0 for disk :/dev/mapper/mpath03p1:

2016-03-21 09:12:26.686: [ SKGFD][3536733952]Lib :UFS:: closing handle 0x7f629808e1e0 for disk :/dev/mapper/mpath03p1:

2016-03-21 09:15:25.124: [ CSSD][3530385152]clssnmvDiskCheck: (/dev/mapper/mpath03p1) No I/O completed after 90% maximum time, 200000 ms, will be considered unusable in 19670 ms

2016-03-21 09:15:45.126: [ CSSD][3530385152](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 200330 ms for voting file /dev/mapper/mpath03p1)

[grid@SCMSDBS05 8540]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. OFFLINE 6d40ce0c73704fbabf8a2a8f73008caa (/dev/mapper/mpath03p1) [OCRDG]
2. ONLINE 1689ccb98f974f0fbf9f174484ec1b37 (/dev/mapper/mpath04p1) [OCRDG]
3. ONLINE 75ce7de14f784f5ebfc67dd08c52fd97 (/dev/mapper/mpath05p1) [OCRDG]
Located 3 voting disk(s).
set lines 200 pages 200
col path for a50
select group_number, mount_status, header_status, state, os_mb, path
from v$asm_disk
order by group_number;

alter diskgroup OCRDG offline disk '/dev/mapper/mpath03p1';

# Clusterware works fine with 2 voting disks
/oraapp/grid/gridhome/bin/crsctl stop crs
/oraapp/grid/gridhome/bin/crsctl start crs -excl

alter diskgroup OCRDG add disk '/dev/mapper/mpath03p1';

— recovering from all the disks failed in OCRDG

dd if=/dev/zero of=/dev/mapper/mpath03p1
dd if=/dev/zero of=/dev/mapper/mpath04p1
dd if=/dev/zero of=/dev/mapper/mpath05p1

# After dd, node 1 was evicted. CRS on node 2 failed, but node 2 didn’t restart.

2016-03-21 11:00:30.233:
[cssd(8403)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /oraapp/grid/gridhome/log/scmsdbs05/cssd/ocssd.log
[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'scmsdbs05'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'scmsdbs05'
CRS-2673: Attempting to stop 'ora.crf' on 'scmsdbs05'
CRS-2677: Stop of 'ora.mdnsd' on 'scmsdbs05' succeeded
CRS-2677: Stop of 'ora.crf' on 'scmsdbs05' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'scmsdbs05'
CRS-2677: Stop of 'ora.gipcd' on 'scmsdbs05' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'scmsdbs05'
CRS-2677: Stop of 'ora.gpnpd' on 'scmsdbs05' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'scmsdbs05' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'scmsdbs05'
CRS-2676: Start of 'ora.mdnsd' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'scmsdbs05'
CRS-2676: Start of 'ora.gpnpd' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'scmsdbs05'
CRS-2672: Attempting to start 'ora.gipcd' on 'scmsdbs05'
CRS-2676: Start of 'ora.cssdmonitor' on 'scmsdbs05' succeeded
CRS-2676: Start of 'ora.gipcd' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'scmsdbs05'
CRS-2672: Attempting to start 'ora.diskmon' on 'scmsdbs05'
CRS-2676: Start of 'ora.diskmon' on 'scmsdbs05' succeeded
CRS-2676: Start of 'ora.cssd' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'scmsdbs05'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'scmsdbs05'
CRS-2672: Attempting to start 'ora.ctssd' on 'scmsdbs05'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'scmsdbs05'
CRS-2676: Start of 'ora.drivers.acfs' on 'scmsdbs05' succeeded
CRS-2676: Start of 'ora.ctssd' on 'scmsdbs05' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'scmsdbs05' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'scmsdbs05'
CRS-2681: Clean of 'ora.asm' on 'scmsdbs05' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'scmsdbs05'
CRS-2676: Start of 'ora.asm' on 'scmsdbs05' succeeded
[root@SCMSDBS05 ~]# su - grid
[grid@SCMSDBS05 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 21 11:13:04 2016

Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> select group_number, name, state from v$asm_diskgroup;

GROUP_NUMBER NAME STATE
------------ ------------------------------ -----------
0 DATADG DISMOUNTED
0 FLASHDG DISMOUNTED

SQL> create diskgroup OCRDG normal redundancy disk '/dev/mapper/mpath03p1','/dev/mapper/mpath04p1','/dev/mapper/mpath05p1' attribute 'compatible.rdbms' = '11.2', 'compatible.asm' = '11.2';

Diskgroup created.

SQL> select group_number, name, state, COMPATIBILITY, DATABASE_COMPATIBILITY from v$asm_diskgroup;

GROUP_NUMBER NAME STATE
------------ ------------------------------ -----------
0 DATADG DISMOUNTED
0 FLASHDG DISMOUNTED
1 OCRDB MOUNTED

— restore OCR

[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/ocrconfig -restore /oraapp/grid/gridhome/cdata/scmsdbs-cluster/day.ocr
[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3128
Available space (kbytes) : 258992
ID : 1459210354
Device/File Name : +OCRDG
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[grid@SCMSDBS05 ~]$ crsctl query css votedisk
Located 0 voting disk(s).

[grid@SCMSDBS05 ~]$ crsctl replace votedisk +OCRDG
CRS-4602: Failed 27 to add voting file 23f42b1f16fd4fb4bfc3f8e6279fea33.
CRS-4602: Failed 27 to add voting file ab3559bf40c74fe9bfb32aaa86c8434c.
CRS-4602: Failed 27 to add voting file a3c736e836064f75bfc4f547f3357ecd.
Failed to replace voting disk group with +OCRDG.
CRS-4000: Command Replace failed, or completed with errors.

— We need to set asm_diskstring parameter

SQL> alter system set asm_diskstring='/dev/mapper/mpath*';

System altered.

SQL> create pfile from memory;

File created.

SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area 1135747072 bytes
Fixed Size 2260728 bytes
Variable Size 1108320520 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled
SQL> select status from v$instance;

STATUS
------------
STARTED

SQL> show parameter pfile

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string
SQL> show parameter diskstring

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/mapper/mpath*
[grid@SCMSDBS05 ~]$ crsctl replace votedisk +OCRDG
Successful addition of voting disk 557a99e08ae74fb8bf8690d6090b045e.
Successful addition of voting disk 329368d830784fedbfc11d254f1248ef.
Successful addition of voting disk 014fad46de954fbdbf794261d49f5f84.
Successfully replaced voting disk group with +OCRDG.
CRS-4266: Voting file(s) successfully replaced

[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/crsctl stop crs
[root@SCMSDBS06 ~]# /oraapp/grid/gridhome/bin/crsctl stop crs -f

[root@SCMSDBS05 ~]# /oraapp/grid/gridhome/bin/crsctl start crs
[root@SCMSDBS06 ~]# /oraapp/grid/gridhome/bin/crsctl start crs

Starting with 11gR2 ASM can start without a PFILE or SPFILE. init+ASMx.ora under ORACLE_HOME/dbs directory will be used.

# default asm alert log directory when using pfile in ASM
/oraapp/grid/gridbase/diag/asm/+asm/+ASM1/trace

# default asm alert log directory when using init+ASM1.ora in file system
/oraapp/grid/gridhome/log/diag/asm/+asm/+ASM1/trace

Reference:

How to start CRS stack when having missing disks from diskgroup storing voting disk (文档 ID 1383888.1)

针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s