Upgrade Oracle GI from 12.1 to 18.5 fails and leaves CRS with status of Upgrade Final

Upgrade Oracle GI from 12.1 to 18.5 fails and leaves CRS with status of Upgrade Final

A couple of weeks ago I was working on a two node Oracle Grid Infrastructure upgrade from 12.1 to 18.5 and everything was going great as both rootupgrade.sh scripts ran correctly, so the only thing pending to run was the gridSetup.sh -executeConfigTools  command, but that failed in the rhprepos upgradeSchema section :

[oracle@node1 /u01/app/18.5.0/grid ]$ ./gridSetup.sh -executeConfigTools -responseFile /tmp/gridresponse.rsp -silent 

########################################
# From the upgrade log file :
########################################
INFO: [Apr 9, 2019 3:24:08 PM] Starting 'Upgrading RHP Repository' 
INFO: [Apr 9, 2019 3:24:08 PM] Starting 'Upgrading RHP Repository' 
INFO: [Apr 9, 2019 3:24:08 PM] Executing RHPUPGRADE 
INFO: [Apr 9, 2019 3:24:08 PM] Command /u01/app/18.5.0/grid/bin/rhprepos upgradeSchema -fromversion 12.1.0.2.0 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn.handleProcess() entered. 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: getting configAssistantParmas. 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: checking secretArguments. 
INFO: [Apr 9, 2019 3:24:08 PM] No arguments to pass to stdin 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: starting read loop. 
INFO: [Apr 9, 2019 3:24:11 PM] Completed Plugin named: rhpupgrade 
INFO: [Apr 9, 2019 3:24:11 PM] ConfigClient.saveSession method called 
INFO: [Apr 9, 2019 3:24:11 PM] Upgrading RHP Repository failed. 
INFO: [Apr 9, 2019 3:24:11 PM] Upgrading RHP Repository failed. 
INFO: [Apr 9, 2019 3:24:11 PM] ConfigClient.executeSelectedToolsInAggregate action performed 
...
INFO: [Apr 9, 2019 3:24:11 PM] Validating state <setup> 
WARNING: [Apr 9, 2019 3:24:11 PM] [WARNING] [INS-43080] Some of the configuration assistants failed, were cancelled or skipped 

[oracle@node1 ~]$ crsctl query crs activeversion -f 
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [UPGRADE FINAL]. The cluster active patch level is [2532936542].

After looking for information in MOS, there wasn’t much that could lead me on how to solve the issue, just a lot of bugs related to the RHP repository.

The main problem was that during the upgrade process , the MGMTDB didn’t upgrade to 18.5 and stayed in the 12.1 version. So when the RHP migration tried to execute, it fails.

I was lucky enough to get on a call with a good friend (@_rickgonzalez ) who is the PM of the RHP and we were able to work through it. So below is what I was able to do to solve the issue.

So the first thing is to bring up the MGMTDB in the 12.1 GI_HOME

[oracle@node1 ~]$ srvctl start mgmtdb
PRCR-1079 : Failed to start resource ora.mgmtdb
CRS-2501: Resource 'ora.mgmtdb' is disabled
[oracle@node1 ~]$ srvctl enable mgmtdb
[oracle@node1 ~]$ srvctl start mgmtdb 
[oracle@node1 ~]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node node2

Once the MGMTDB is up and running , you need to drop the RHP service that was created during the rootupgrade process. This has to be done from the 18.5 GI_HOME

[root@node2 ~]$ env | grep ORA
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node2 ~]$ srvctl remove rhpserver
PRCT-1470 : failed to reset the Rapid Home Provisioning (RHP) repository
 PRCT-1011 : Failed to run "mgmtca". Detailed error: [MGTCA-1005 : Could not connect to the GIMR. 
 ORA-01034: ORACLE not available
 ORA-27101: shared memory realm does not exist
 Linux-x86_64 Error: 2: No such file or directory
 Additional information: 4150
 Additional information: -1526109961
 ]
[root@node2 ~]$ srvctl remove rhpserver -f

Now that the RHP service has been removed, we need to remove the MGMTDB in 12.1.

This has to be done from the first node, not that it can’t be done from the other nodes, but it was highly recommended from Oracle that it be done in the first node, so it is running from any other node, relocate it to the first node.

########################################
# As root user in BOTH nodes
########################################
#Node 1
[root@node1 ~]$  export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@node1 ~]$  export PATH=$PATH:$ORACLE_HOME/bin
[root@node1 ~]$ crsctl stop res ora.crf -init
[root@node1 ~]$ crsctl modify res ora.crf -attr ENABLED=0 -init

#Node 2
[root@node2 ~]$  export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@node2 ~]$  export PATH=$PATH:$ORACLE_HOME/bin
[root@node2 ~]$ crsctl stop res ora.crf -init
[root@node2 ~]$ crsctl modify res ora.crf -attr ENABLED=0 -init

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/12.1.0.2/grid
[oracle@node1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@node1 ~]$ srvctl relocate mgmtdb -node node1                                                          
[oracle@node1 ~]$ srvctl stop mgmtdb
[oracle@node1 ~]$ srvctl stop mgmtlsnr
[oracle@node1 ~]$ srvctl remove mgmtdb -force
Remove the database _mgmtdb? (y/[n]) y
########################################
##### Manually Removed the mgmtdb files
##### Verify that the files for MGMTDB match your environment before deleting them
########################################
ASMCMD> cd DBFS_DG/_MGMTDB/DATAFILE
ASMCMD> ls
SYSAUX.257.879563483
SYSTEM.258.879563493
UNDOTBS1.259.879563509
ASMCMD> rm system.258.879563493
ASMCMD> rm sysaux.257.879563483
ASMCMD> rm undotbs1.259.879563509
ASMCMD> cd ../PARAMETERFILE
ASMCMD> rm spfile.268.879563627
ASMCMD> cd ../TEMPFILE
ASMCMD> rm TEMP.264.879563553
ASMCMD> cd ../ONLINELOG
ASMCMD> rm group_1.261.879563549
ASMCMD> rm group_2.262.879563549
ASMCMD> rm group_3.263.879563549
ASMCMD> cd ../CONTROLFILE
ASMCMD> rm Current.260.879563547

Once the MGMTDB is deleted, then we now run the mdbutil.pl (which you can grab from MOS Doc 2065175.1) and add the MGMTDB in the 18.5 GI_HOME.

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[oracle@node1 ~]$ ./mdbutil.pl --addmdb --target=+DBFS_DG
mdbutil.pl version : 1.95
2019-04-14 19:11:48: I Starting To Configure MGMTDB at +DBFS_DG...
2019-04-14 19:11:53: I Container database creation in progress... for GI 18.0.0.0.0
2019-04-14 19:20:29: I Plugable database creation in progress...
2019-04-14 19:22:25: I Executing "/tmp/mdbutil.pl --addchm" on node1 as root to configure CHM.
root@node1's password:
2019-04-14 19:23:08: W Not able to execute "/tmp/mdbutil.pl --addchm" on node1 as root to configure CHM.
2019-04-14 19:23:08: I Executing "/tmp/mdbutil.pl --addchm" on node2 as root to configure CHM.
root@node2's password:
2019-04-14 19:23:27: W Not able to execute "/tmp/mdbutil.pl --addchm" on node2 as root to configure CHM.
2019-04-14 19:23:27: I MGMTDB & CHM configuration done!

########################################
# As root user in BOTH nodes
########################################
[root@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node1 ~]$ /tmp/mdbutil.pl --addchm ##Only if it failed in the mdbutil.pl execution
[root@node1 ~]$ crsctl modify res ora.crf -attr ENABLED=1 -init
[root@node1 ~]$ crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'node1'
CRS-2676: Start of 'ora.crf' on 'node1' succeeded

[root@node2 ~]$ env | grep ORA
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node2 ~]$ /tmp/mdbutil.pl --addchm ##Only if it failed in the mdbutil.pl execution
[root@node2 ~]$ crsctl modify res ora.crf -attr ENABLED=1 -init
[root@node2 ~]$ crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'node2'
CRS-2676: Start of 'ora.crf' on 'node2' succeeded

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node tstedbadm01
oracle@node1 : ~> srvctl status mgmtlsnr
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): tstedbadm01
oracle@node1 : ~> srvctl config MGMTDB
Database unique name: _mgmtdb
Database name: 
Oracle home: <CRS home>
Oracle user: oracle
Spfile: +DBFS_DG/_MGMTDB/PARAMETERFILE/spfile.282.1005320705
Password file: 
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Type: Management
PDB name: GIMR_DSCREP_10
PDB service: GIMR_DSCREP_10
Cluster name: test-clu
Database instance: -MGMTDB

Once the MGMTDB has been recreated, we now rerun the gridSetup.sh -executeConfigTools command, and we can now see that the cluster status is NORMAL and everything is running as expected in 18.5 version.

[oracle@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[oracle@node1 ~]$ /u01/app/18.5.0/grid/gridSetup.sh -executeConfigTools -responseFile /tmp/gridresponse.rsp -silent 
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the logs of this session at:
/u01/app/oraInventory/logs/GridSetupActions2019-04-11_04-07-18PM

Successfully Configured Software.

[oracle@node1 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2532936542].

[oracle@node1 ~]$ crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Hope this blog post helps you to solve this issue if you ever face this problem. A quick note is that we were not using the Rapid Home Provisioning feature , and it was without any impact to the environment the deletion of the GIMR database. If you are using RHP, I highly recommend that you contact Oracle before running this, as you will lose the RHP repository if you follow them.

Also it was confirmed by them , that this is a bug in the upgrade process of 18.X, so hopefully they will be fixing it soon.

Tags:
Rene Antunez
[email protected]
No Comments

Sorry, the comment form is closed at this time.