How an ORA-01102 Error Drove Me Nuts in a 19c Environment and How It Was Fixed

How an ORA-01102 Error Drove Me Nuts in a 19c Environment and How It Was Fixed

This blog post is part rant and part on how I was able to fix an issue in the second instance that was failing with ORA-01102: cannot mount database in EXCLUSIVE mode when trying to start it up.

Yesterday a teamate mentioned to me that when he was trying to bring up the second instance, in a 2 node 19.10 RAC environment, he was not able to and it was failing with the error ORA-01102. 

[oracle@hostname1 ~]$ srvctl start database -db TESTDB
PRCR-1079 : Failed to start resource ora.testdb.db
CRS-5017: The resource action "ora.testdb.db start" encountered the following error:
ORA-01102: cannot mount database in EXCLUSIVE mode
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/hostname2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.testdb.db' on 'hostname2' failed
CRS-2632: There are no more servers to try to place resource 'ora.testdb.db' on that would satisfy its placement policy

Immediately I asked him to verify the parameter cluster_database and make sure that the value was equal to TRUE. After verifying the value, yes this was the case.

SQL> show parameter cluster_database

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2

So my next train of thought is that the instances were using a different parameter file, so I asked him to remove all the init.ora/spfiles in $ORACLE_HOME/dbs belonging to TESTDB. Once those were removed recreate the spfile from a backup I had made previously, and make sure that the srvctl config was pointing to the ASM parameter file. 

SQL>create spfile='+DATAC1/TESTDB/spfileTESTDB.ora' from pfile='/tmp/pfile.ora';

File created.

[oracle@hostname1 ~]$ srvctl modify database -db TESTDB -spfile '+DATAC1/TESTDB/spfileTESTDB.ora';

[oracle@hostname1 ~]$ srvctl config database -db TESTDB -a | grep Spfile
Spfile: +DATAC1/TESTDB/spfileTESTDB.ora

But even after doing this, the ORA-01102 error still appeared. My next trick in the bag was that there were semaphores or shared memory in the second node still present that were not allowing the instance to come up. So I asked him to check and remove any that were present.

One thing to note on this command is that I had sourced the TESTDB2 environment and it is the only one running on that ORACLE_HOME.

[oracle@hostname2 ~]$ sysresv

IPC Resources for ORACLE_SID "TESTDB2" :
Shared Memory
ID              KEY
No shared memory segments used
Semaphores:
ID              KEY
13926465        0x938ea8e8
13959234        0x938ea8e9
13992003        0x938ea8ea
Oracle Instance not alive for sid "TESTDB2"

[oracle@hostname2 ~]$ sysresv -i
Remove ipc resources for sid "TESTDB2" (y/n)y
Done removing ipc resources for sid "TESTDB2"

[oracle@hostname2 ~]$ sysresv

IPC Resources for ORACLE_SID "TESTDB2" :
Shared Memory
ID KEY
No shared memory segments used
Semaphores:
ID KEY
No semaphore resources used
Oracle Instance not alive for sid "TESTDB2"

[oracle@hostname2 ~]$ srvctl start instance -d TESTDB -i TESTDB2
PRCR-1013 : Failed to start resource ora.testdb.db
PRCR-1064 : Failed to start resource ora.testdb.db on node hostname2
CRS-5017: The resource action "ora.testdb.db start" encountered the following error:
ORA-01102: cannot mount database in EXCLUSIVE mode
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/hostname2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.testdb.db' on 'hostname2' failed

I Tried to start the instance and still no luck. By now, this was driving me crazy. So now what I did was shutdown the database (not the instances) and try to startup node 2 first and then node 1. I was able to start node 2, but node 1 failed with the same error, so it really was something with the cluster_database parameter, but WHAT?! As it was set correctly. Even going to the trace of the LMON, it indicated the difference between the 2 instances with the cluster_database parameter.

Trace file /u02/app/oracle/diag/rdbms/hostname1/TESTDB1/trace/TESTDB1_lmon_159463.trc
Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
...
* kjfcqiora: query MULTIPLE LMD ENQUEUE INFO of inst 1 = 2@*@*, vallen 5, strlen 5
* kjfcqiora: parsing of string complete
kjfcqiora: skipping namespace 14, not in use anymore
* My cluster_database = 1, instance 2's cluster_database = 0

Error: This instance attempts to mount shared but instance 2 has already mounted exclusive.
ORA-1102 signalled during global resource directory mount

For reference, cluster_database = 1 means that the value is TRUE and cluster_database = 0 means that the value is FALSE. But how can that be as I was using srvctl to start the database/instances and both instances were using the same spfile.

Going to the DB log, it mentions that the parameter is set to TRUE!

Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.10.0.0.0.
ORACLE_HOME:    /u02/app/oracle/product/19.0.0.0/dbhome_1
System name:    Linux
Node name:      hostname1.local.com
Release:        4.1.12-124.30.1.el7uek.x86_64
Version:        #2 SMP Thu Aug 8 01:38:50 GMT 2019
Machine:        x86_64
VM name:        Xen Version: 4.4 (HVM)
Storage:        Exadata
Using parameter settings in server-side spfile +DATAC1/TESTDB/PARAMETERFILE/spfile.5884.1070813179
System parameters with non-default values:
...
  cluster_database         = TRUE
...
  enable_pluggable_database= TRUE
  cdb_cluster              = TRUE
2021-08-03T11:17:58.790325-03:00
Cluster Communication is configured to use IPs from: GPnP
IP: 100.107.2.3  Subnet: 100.107.2.0
IP: 100.107.2.4  Subnet: 100.107.2.0
Process termination requested for pid 302927 [source = rdbms], [info = 0] [request issued by pid: 300458, uid: 1001]
USER (ospid: 300458): terminating the instance due to ORA error 1102
2021-08-03T11:18:03.989254-03:00
Instance terminated by USER, pid = 300458

But one thing that stood out in the log is the following:

System parameters with non-default values:

cdb_cluster = TRUE

Searching for the parameter cdb_cluster , there isn’t a lot of information for this parameter. The only thing is what Mike Dietrich wrote, that it was introduced in 12.2 and it was undocumented. Searching for more information on the parameter, found that the default value is suppossed to be false. 

The view called parameter_view, you can download it from git .

SQL>select * from parameter_view where name='cdb_cluster';

NOT_O Status   ISDEP ISBAS NAME                                          TYPE_       DEFAULT_VALUE
----- -------- ----- ----- --------------------------------------------- ----------- --------------------------------------------
DESCRIPTION                                                                                                                                                                                                                                 
------------------------------------------------------------------------------------------------------------------------------------ --------- ----- -----
TRUE  OK       FALSE FALSE cdb_cluster                                   boolean     FALSE
if TRUE startup in CDB Cluster mode                                                                                                                                                                                                         
...
1 row selected.

What does it mean that “if TRUE startup in CDB Cluster mode”?

How does this parameter relate to cluster_database that I was seeing in the LMON trace?

Arghhh!!! What a pain!

So what I did was change the value of cdb_cluster to false (which is the default value) and as soon as that change went in, all of the instances came up. In the LMON trace file, now I was seeing things as expected.

SQL>alter system set cdb_cluster=FALSE sid='*' scope=spfile;

System altered.

SQL>exit
[oracle@hostname2 ~]$ srvctl stop database -db TESTDB
[oracle@hostname2 ~]$ srvctl start instance -d TESTDB -i TESTDB2
[oracle@hostname2 ~]$ srvctl status database -db TESTDB
Instance TESTDB1 is not running on node hostname1
Instance TESTDB2 is not running on node hostname2
[oracle@hostname2 ~]$ srvctl start database -db TESTDB

[oracle@hostname2 ~]$ srvctl status database -db TESTDB
Instance TESTDB1 is running on node hostname1
Instance TESTDB2 is running on node hostname2
[oracle@hostname2 ~]$ more /u02/app/oracle/diag/rdbms/testdb/TESTDB2/trace/TESTDB2_lmon_380312.trc
Trace file /u02/app/oracle/diag/rdbms/testdb/TESTDB2/trace/TESTDB2_lmon_380312.trc
Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
...
*** 2021-08-04T15:28:50.319294-03:00 (CDB$ROOT(1))
     Name Service frozen
kjxgmcs: Setting state to 0 1.
ksxp_update_peer_data: still not ready
kjxgrssvote: reconfig bitmap chksum 0x3e76b10c cnt 2 master 1 ret 0
ksirConvertFromGlobalData flags_ksirgdp 0kjxggreg_dball: register to group DAALL_DB_
* kjrnlocalres: returns 36011 (pqo 969, p 2048, dl 13656, t 3414, er 14700, df 1024 [psi 2, pms 320, txns 3414, ses 3104, procs 2048])
* kjrnress: return nress 59417 (path 1, localres 36011, rescachesz 0)
kjfcpiora: published my fusion master weight 672543
kjfcpiora: published my enqueue weight 59417
kjfcpiora: publish my flogb 18
kjfcpiora: published my cluster_database parameter=1
kjfcpiora: published my cdb_cluster parameter=0
...
*** 2021-08-04T15:29:49.160047-03:00 (CDB$ROOT(1))
kjpmLMSPriorityDecision: Automatic priority change info:
     Member number       7
     Node number         1
     DB rank             6
     Start time          681416930260525
     Tiebreaker (rand)   3198786163
     Total LMS           3
     RT LMS (at startup) 3
     Automatic priority changes are enabled (param)
kjpmThresholdCalc: threshold is 32
kjpmLMSPriorityDecision: 3 of 3 LMSs are running at elevated priority

My rant is that whoever decided to put this parameter in 12.2, and not make it a hidden parameter, and not document what is suppossed to do deserves a smack on the top of their head for giving me a headache trying to solve this! Hope that this blog post helps you if you are ever in the same situation.

Update from

Thanks for bringing this to our attention @rene_ace. I understand your frustration. I’ve followed up internally to either document the parameter or get it removed. Can’t explain what it’s meant to do here, but it’s not supposed to give you such a headache. Sorry for that! Thanks.

Comments

comments

Tags:
Rene Antunez
antunez.rene@gmail.com
No Comments

Leave a Reply