04 Aug How an ORA-01102 Error Drove Me Nuts in a 19c Environment and How It Was Fixed
This blog post is part rant and part on how I was able to fix an issue in the second instance that was failing with ORA-01102: cannot mount database in EXCLUSIVE mode when trying to start it up.
Yesterday a teamate mentioned to me that when he was trying to bring up the second instance, in a 2 node 19.10 RAC environment, he was not able to and it was failing with the error ORA-01102.
[oracle@hostname1 ~]$ srvctl start database -db TESTDB PRCR-1079 : Failed to start resource ora.testdb.db CRS-5017: The resource action "ora.testdb.db start" encountered the following error: ORA-01102: cannot mount database in EXCLUSIVE mode . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/hostname2/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.testdb.db' on 'hostname2' failed CRS-2632: There are no more servers to try to place resource 'ora.testdb.db' on that would satisfy its placement policy
Immediately I asked him to verify the parameter cluster_database and make sure that the value was equal to TRUE. After verifying the value, yes this was the case.
SQL> show parameter cluster_database NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ cluster_database boolean TRUE cluster_database_instances integer 2
So my next train of thought is that the instances were using a different parameter file, so I asked him to remove all the init.ora/spfiles in $ORACLE_HOME/dbs belonging to TESTDB. Once those were removed recreate the spfile from a backup I had made previously, and make sure that the srvctl config was pointing to the ASM parameter file.
SQL>create spfile='+DATAC1/TESTDB/spfileTESTDB.ora' from pfile='/tmp/pfile.ora'; File created. [oracle@hostname1 ~]$ srvctl modify database -db TESTDB -spfile '+DATAC1/TESTDB/spfileTESTDB.ora'; [oracle@hostname1 ~]$ srvctl config database -db TESTDB -a | grep Spfile Spfile: +DATAC1/TESTDB/spfileTESTDB.ora
But even after doing this, the ORA-01102 error still appeared. My next trick in the bag was that there were semaphores or shared memory in the second node still present that were not allowing the instance to come up. So I asked him to check and remove any that were present.
One thing to note on this command is that I had sourced the TESTDB2 environment and it is the only one running on that ORACLE_HOME.
[oracle@hostname2 ~]$ sysresv IPC Resources for ORACLE_SID "TESTDB2" : Shared Memory ID KEY No shared memory segments used Semaphores: ID KEY 13926465 0x938ea8e8 13959234 0x938ea8e9 13992003 0x938ea8ea Oracle Instance not alive for sid "TESTDB2" [oracle@hostname2 ~]$ sysresv -i Remove ipc resources for sid "TESTDB2" (y/n)y Done removing ipc resources for sid "TESTDB2" [oracle@hostname2 ~]$ sysresv IPC Resources for ORACLE_SID "TESTDB2" : Shared Memory ID KEY No shared memory segments used Semaphores: ID KEY No semaphore resources used Oracle Instance not alive for sid "TESTDB2" [oracle@hostname2 ~]$ srvctl start instance -d TESTDB -i TESTDB2 PRCR-1013 : Failed to start resource ora.testdb.db PRCR-1064 : Failed to start resource ora.testdb.db on node hostname2 CRS-5017: The resource action "ora.testdb.db start" encountered the following error: ORA-01102: cannot mount database in EXCLUSIVE mode . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/hostname2/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.testdb.db' on 'hostname2' failed
I Tried to start the instance and still no luck. By now, this was driving me crazy. So now what I did was shutdown the database (not the instances) and try to startup node 2 first and then node 1. I was able to start node 2, but node 1 failed with the same error, so it really was something with the cluster_database parameter, but WHAT?! As it was set correctly. Even going to the trace of the LMON, it indicated the difference between the 2 instances with the cluster_database parameter.
Trace file /u02/app/oracle/diag/rdbms/hostname1/TESTDB1/trace/TESTDB1_lmon_159463.trc Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.10.0.0.0 ... * kjfcqiora: query MULTIPLE LMD ENQUEUE INFO of inst 1 = 2@*@*, vallen 5, strlen 5 * kjfcqiora: parsing of string complete kjfcqiora: skipping namespace 14, not in use anymore * My cluster_database = 1, instance 2's cluster_database = 0 Error: This instance attempts to mount shared but instance 2 has already mounted exclusive. ORA-1102 signalled during global resource directory mount
For reference, cluster_database = 1 means that the value is TRUE and cluster_database = 0 means that the value is FALSE. But how can that be as I was using srvctl to start the database/instances and both instances were using the same spfile.
Going to the DB log, it mentions that the parameter is set to TRUE!
Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.10.0.0.0. ORACLE_HOME: /u02/app/oracle/product/19.0.0.0/dbhome_1 System name: Linux Node name: hostname1.local.com Release: 4.1.12-124.30.1.el7uek.x86_64 Version: #2 SMP Thu Aug 8 01:38:50 GMT 2019 Machine: x86_64 VM name: Xen Version: 4.4 (HVM) Storage: Exadata Using parameter settings in server-side spfile +DATAC1/TESTDB/PARAMETERFILE/spfile.5884.1070813179 System parameters with non-default values: ... cluster_database = TRUE ... enable_pluggable_database= TRUE cdb_cluster = TRUE 2021-08-03T11:17:58.790325-03:00 Cluster Communication is configured to use IPs from: GPnP IP: 100.107.2.3 Subnet: 100.107.2.0 IP: 100.107.2.4 Subnet: 100.107.2.0 Process termination requested for pid 302927 [source = rdbms], [info = 0] [request issued by pid: 300458, uid: 1001] USER (ospid: 300458): terminating the instance due to ORA error 1102 2021-08-03T11:18:03.989254-03:00 Instance terminated by USER, pid = 300458
But one thing that stood out in the log is the following:
System parameters with non-default values:
…
cdb_cluster = TRUE
Searching for the parameter cdb_cluster , there isn’t a lot of information for this parameter. The only thing is what Mike Dietrich wrote, that it was introduced in 12.2 and it was undocumented. Searching for more information on the parameter, found that the default value is suppossed to be false.
The view called parameter_view, you can download it from git .
SQL>select * from parameter_view where name='cdb_cluster'; NOT_O Status ISDEP ISBAS NAME TYPE_ DEFAULT_VALUE ----- -------- ----- ----- --------------------------------------------- ----------- -------------------------------------------- DESCRIPTION ------------------------------------------------------------------------------------------------------------------------------------ --------- ----- ----- TRUE OK FALSE FALSE cdb_cluster boolean FALSE if TRUE startup in CDB Cluster mode ... 1 row selected.
What does it mean that “if TRUE startup in CDB Cluster mode”?
How does this parameter relate to cluster_database that I was seeing in the LMON trace?
Arghhh!!! What a pain!
So what I did was change the value of cdb_cluster to false (which is the default value) and as soon as that change went in, all of the instances came up. In the LMON trace file, now I was seeing things as expected.
SQL>alter system set cdb_cluster=FALSE sid='*' scope=spfile; System altered. SQL>exit [oracle@hostname2 ~]$ srvctl stop database -db TESTDB [oracle@hostname2 ~]$ srvctl start instance -d TESTDB -i TESTDB2 [oracle@hostname2 ~]$ srvctl status database -db TESTDB Instance TESTDB1 is not running on node hostname1 Instance TESTDB2 is not running on node hostname2 [oracle@hostname2 ~]$ srvctl start database -db TESTDB [oracle@hostname2 ~]$ srvctl status database -db TESTDB Instance TESTDB1 is running on node hostname1 Instance TESTDB2 is running on node hostname2 [oracle@hostname2 ~]$ more /u02/app/oracle/diag/rdbms/testdb/TESTDB2/trace/TESTDB2_lmon_380312.trc Trace file /u02/app/oracle/diag/rdbms/testdb/TESTDB2/trace/TESTDB2_lmon_380312.trc Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.10.0.0.0 ... *** 2021-08-04T15:28:50.319294-03:00 (CDB$ROOT(1)) Name Service frozen kjxgmcs: Setting state to 0 1. ksxp_update_peer_data: still not ready kjxgrssvote: reconfig bitmap chksum 0x3e76b10c cnt 2 master 1 ret 0 ksirConvertFromGlobalData flags_ksirgdp 0kjxggreg_dball: register to group DAALL_DB_ * kjrnlocalres: returns 36011 (pqo 969, p 2048, dl 13656, t 3414, er 14700, df 1024 [psi 2, pms 320, txns 3414, ses 3104, procs 2048]) * kjrnress: return nress 59417 (path 1, localres 36011, rescachesz 0) kjfcpiora: published my fusion master weight 672543 kjfcpiora: published my enqueue weight 59417 kjfcpiora: publish my flogb 18 kjfcpiora: published my cluster_database parameter=1 kjfcpiora: published my cdb_cluster parameter=0 ... *** 2021-08-04T15:29:49.160047-03:00 (CDB$ROOT(1)) kjpmLMSPriorityDecision: Automatic priority change info: Member number 7 Node number 1 DB rank 6 Start time 681416930260525 Tiebreaker (rand) 3198786163 Total LMS 3 RT LMS (at startup) 3 Automatic priority changes are enabled (param) kjpmThresholdCalc: threshold is 32 kjpmLMSPriorityDecision: 3 of 3 LMSs are running at elevated priority
My rant is that whoever decided to put this parameter in 12.2, and not make it a hidden parameter, and not document what is suppossed to do deserves a smack on the top of their head for giving me a headache trying to solve this! Hope that this blog post helps you if you are ever in the same situation.
Update from Markus Michalewicz @KnownAsMarkus :
Thanks for bringing this to our attention @rene_ace. I understand your frustration. I’ve followed up internally to either document the parameter or get it removed. Can’t explain what it’s meant to do here, but it’s not supposed to give you such a headache. Sorry for that! Thanks.
Sorry, the comment form is closed at this time.