How I Finished a GI OOP Patching From 19.6 to 19.8 After Facing cluutil: No Such File or Directory and clsrsc-740 Errors

How I Finished a GI OOP Patching From 19.6 to 19.8 After Facing cluutil: No Such File or Directory and clsrsc-740 Errors

This past weekend I was doing a production grid infrastructure(GI) Out Of Place Patching (OOP) from 19.6 to 19.8 for a client and while doing this exercise, I hit several bugs along the way (Bugs 20785766 and 27554103).

This blog post is how I solved them and hope it saves you a lot of time if you ever face this issue.

As I have already blogged in the past on how to do a GI OOP, I won’t go into several details of this process, I will just mention what I was doing.

I did the switchGridHome from 19.6 to 19.8 without any issues and ran successfully root.sh in node1 

[grid@hostname1 grid]$ ./gridSetup.sh -switchGridHome -silent
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2020-11-20_09-10-17PM.log


As a root user, execute the following script(s):
        1. /u01/app/19.8.0.0/grid/root.sh

Execute /u01/app/19.8.0.0/grid/root.sh on the following nodes:
[hostname1, hostname2]

Run the scripts on the local node first. After successful completion, run the scripts in sequence on all other nodes.

Successfully Setup Software.
...
[root@hostname1 ~]# /u01/app/19.8.0.0/grid/root.sh
Check /u01/app/19.8.0.0/grid/install/root_oracle-db01-s01_2020-11-20_21-13-24-032842094.log for the output of root script

When I ran the root.sh in node2, I ran into the error The CRS executable file ‘clsecho’ does not exist, so I went and checked and indeed the file didn’t exist in GI_HOME/bin. Doing a check between node1 and node2, there was a difference of about 100 files for this directory.

[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh
Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script

[root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log
2020-11-20 21:42:27: The 'ROOTCRS_PREPATCH' is either in START/FAILED state
2020-11-20 21:42:27:  The CRS executable file /u01/app/19.8.0.0/grid/bin/cluutil either does not exist or is not executable
2020-11-20 21:42:27: Invoking "/u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status"
2020-11-20 21:42:27: trace file=/u01/app/oracle/crsdata/hostname2/crsconfig/cluutil3.log
2020-11-20 21:42:27: Running as user grid: /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
2020-11-20 21:42:27: Removing file /tmp/X9bxqSWx3c
2020-11-20 21:42:27: Successfully removed file: /tmp/X9bxqSWx3c
2020-11-20 21:42:27: pipe exit code: 32512
2020-11-20 21:42:27: /bin/su exited with rc=127

2020-11-20 21:42:27: bash: /u01/app/19.8.0.0/grid/bin/cluutil: No such file or directory

2020-11-20 21:42:27:  The CRS executable file /u01/app/19.8.0.0/grid/bin/clsecho either does not exist or is not executable
2020-11-20 21:42:27: The CRS executable file 'clsecho' does not exist.
2020-11-20 21:42:27: ###### Begin DIE Stack Trace ######
2020-11-20 21:42:27:     Package         File                 Line Calling
2020-11-20 21:42:27:     --------------- -------------------- ---- ----------
2020-11-20 21:42:27:  1: main            rootcrs.pl            357 crsutils::dietrap
2020-11-20 21:42:27:  2: crspatch        crspatch.pm          2815 main::__ANON__
2020-11-20 21:42:27:  3: crspatch        crspatch.pm          2203 crspatch::postPatchRerunCheck
2020-11-20 21:42:27:  4: crspatch        crspatch.pm          2015 crspatch::crsPostPatchCkpts
2020-11-20 21:42:27:  5: crspatch        crspatch.pm           394 crspatch::crsPostPatch
2020-11-20 21:42:27:  6: main            rootcrs.pl            370 crspatch::new
2020-11-20 21:42:27: ####### End DIE Stack Trace #######

2020-11-20 21:42:27:  checkpoint has failed

########################################################################
## Difference of Number of files between node1 and node2
########################################################################
[root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
405
[root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
303

The first thing I did after the failure, was go and check the status of the cluster with Fred Dennis’s script rac_status. I found that everything was up and the crs status was in ROLLING PATCH mode. I saw that the crs was running with the 19.8 version in node1 and with the 19.6 version in node2 .

[grid@hostname1 antunez]$ ./rac_status.sh -a

                Cluster rene-ace-cluster

        Type      |      Name      |      hostname1    |      hostname2      |
  ---------------------------------------------------------------------------
   MGMTLSNR       | MGMTLSNR       |       Online       |          -         |
   asm            | asm            |       Online       |       Online       |
   asmnetwork     | asmnet1        |       Online       |       Online       |
   chad           | chad           |       Online       |       Online       |
   cvu            | cvu            |          -         |       Online       |
   dg             | ORAARCH        |       Online       |       Online       |
   dg             | ORACRS         |       Online       |       Online       |
   dg             | ORADATA        |       Online       |       Online       |
   dg             | ORAFLASHBACK   |       Online       |       Online       |
   dg             | ORAREDO        |       Online       |       Online       |
   network        | net1           |       Online       |       Online       |
   ons            | ons            |       Online       |       Online       |
   qosmserver     | qosmserver     |          -         |       Online       |
   vip            | hostname1      |       Online       |          -         |
   vip            | hostname2      |          -         |       Online       |
   vip            | scan1          |       Online       |          -         |
   vip            | scan2          |          -         |       Online       |
   vip            | scan3          |          -         |       Online       |
  ---------------------------------------------------------------------------
    x  : Resource is disabled
       : Has been restarted less than 24 hours ago
       : STATUS and TARGET are different

      Listener    |      Port      |     hostname1      |      hostname2     |     Type     |
  ------------------------------------------------------------------------------------------
   ASMNET1LSNR_ASM| TCP:1526       |       Online       |       Online       |   Listener   |
   LISTENER       | TCP:1521,1525  |       Online       |       Online       |   Listener   |
   LISTENER_SCAN1 | TCP:1521,1525  |       Online       |          -         |     SCAN     |
   LISTENER_SCAN2 | TCP:1521,1525  |          -         |       Online       |     SCAN     |
   LISTENER_SCAN3 | TCP:1521,1525  |          -         |       Online       |     SCAN     |
  ------------------------------------------------------------------------------------------
       : Has been restarted less than 24 hours ago

         DB       |     Version    |      hostname1     |      hostname2     |    DB Type   |
  ------------------------------------------------------------------------------------------
   mgm            |            (2) |        Open        |         -          |  MGMTDB (P)  |
   prod           | 12.1.0     (1) |        Open        |        Open        |    RAC (P)   |
  ------------------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column ("''" means "same as above")

         1 : /u01/app/oracle/product/12.1.0/db_1        oracle oinstall
         2 : %CRS_HOME%                                 grid      ''

       : Has been restarted less than 24 hours ago
       : STATUS and TARGET are different

[grid@hostname1 antunez]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2701864972].

I found MOS note Grid Infrastructure root script (root.sh etc) fails as remote node missing binaries (Doc ID 1991928.1) that there is a bug (20785766) in the GI installer in 12.1 for files missing in the GI_HOME/bin and/or GI_HOME/lib. Even though the document mentions 12.1, I hit it with the 19.8 version and it applied to my issue, so I did what the note says which is:

the workaround is to manually copy missing files from the node where installer was started and re-run root script

I excluded the soft link lbuilder as that was already created in the second node and I changed of ownership to root:oinstall to the GI_HOME/bin files in node2.

########################################################################
## From node2
########################################################################
[root@hostname2 bin]# ls -al | grep "lbuilder"
lrwxrwxrwx.  1 grid oinstall        24 Nov 20 21:10 lbuilder -> ../nls/lbuilder/lbuilder

########################################################################
## From node1
########################################################################
[root@hostname1 ~]$ cd /u01/app/19.8.0.0/grid/bin 
[root@hostname1 ~]$ find . ! -name "lbuilder" | xargs -i scp {} hostname2:/u01/app/19.8.0.0/grid/bin

########################################################################
## Difference of Number of files between node1 and node2
########################################################################
[root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
405
[root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
405

########################################################################
## Changed the ownership to root:oinstall in hostname2
########################################################################
[root@hostname2 ~]$ cd /u01/app/19.8.0.0/grid/bin 
[root@hostname2 bin]$ chown root:oinstall ./*

Now that I had copied the files, I did a relink of the GI_HOME in node2, using this documentation note, as the sticky bits were lost with the scp.

A few notes on the relink in this situation

  1. As the active GI binaries in node 2 were still  from the 19.6 GI_HOME, I didn’t need to run rootcrs.sh -unlock
  2. I didn’t run rootadd_rdbms.sh, as this runs as part of the /u01/app/19.8.0.0/grid/root.sh that I was going to rerun after the fix above.
  3. Similar to point 1, I didn’t run rootcrs.sh -lock
[grid@hostname2 ~]$ export ORACLE_HOME=/u01/app/19.8.0.0/grid
[grid@hostname2 ~]$ $ORACLE_HOME/bin/relink

After the relink, I reran in node 2 the /u01/app/19.8.0.0/grid/root.sh, but now I got a new error CLSRSC-740: inconsistent options specified to the postpatch command.

[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh
Check /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log for the output of root script

[root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log

2020-11-20 23:39:28: NONROLLING=0

2020-11-20 23:39:28: Succeeded to get property value:NONROLLING=0

2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740
2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740
2020-11-20 23:39:28: Command output:
>  CLSRSC-740: inconsistent options specified to the postpatch command
>End Command output
2020-11-20 23:39:28: CLSRSC-740: inconsistent options specified to the postpatch command
2020-11-20 23:39:28: ###### Begin DIE Stack Trace ######
2020-11-20 23:39:28:     Package         File                 Line Calling
2020-11-20 23:39:28:     --------------- -------------------- ---- ----------
2020-11-20 23:39:28:  1: main            rootcrs.pl            357 crsutils::dietrap
2020-11-20 23:39:28:  2: crspatch        crspatch.pm          2212 main::__ANON__
2020-11-20 23:39:28:  3: crspatch        crspatch.pm          2015 crspatch::crsPostPatchCkpts
2020-11-20 23:39:28:  4: crspatch        crspatch.pm           394 crspatch::crsPostPatch
2020-11-20 23:39:28:  5: main            rootcrs.pl            370 crspatch::new
2020-11-20 23:39:28: ####### End DIE Stack Trace #######

2020-11-20 23:39:28:  checkpoint has failed

After investigation I saw that the checkpoint ROOTCRS_PREPATCH status was marked as successful from the previous failed run of the root.sh command.

[grid@hostname2 ~]$ /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
SUCCESS

Continuing to investigate found that this error was part of bug 27554103. I solved this error by changing the checkpoint  ROOTCRS_PREPATCH to the status “start” and rerunning the /u01/app/19.8.0.0/grid/root.sh in node2. 

[root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_PREPATCH -state START

[root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
START

[root@hostname2 ~]# /u01/app/19.8.0.0/grid/root.sh
Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script

After doing the steps above, I saw that everything was now as it should be in both nodes and the cluster upgrade state was in NORMAL state.

[grid@hostname1 antunez]$ ./rac_status.sh -a

                Cluster rene-ace-cluster

        Type      |      Name      |      hostname1    |      hostname2      |
  ---------------------------------------------------------------------------
   MGMTLSNR       | MGMTLSNR       |       Online       |          -         |
   asm            | asm            |       Online       |       Online       |
   asmnetwork     | asmnet1        |       Online       |       Online       |
   chad           | chad           |       Online       |       Online       |
   cvu            | cvu            |          -         |       Online       |
   dg             | ORAARCH        |       Online       |       Online       |
   dg             | ORACRS         |       Online       |       Online       |
   dg             | ORADATA        |       Online       |       Online       |
   dg             | ORAFLASHBACK   |       Online       |       Online       |
   dg             | ORAREDO        |       Online       |       Online       |
   network        | net1           |       Online       |       Online       |
   ons            | ons            |       Online       |       Online       |
   qosmserver     | qosmserver     |          -         |       Online       |
   vip            | hostname1      |       Online       |          -         |
   vip            | hostname2      |          -         |       Online       |
   vip            | scan1          |       Online       |          -         |
   vip            | scan2          |          -         |       Online       |
   vip            | scan3          |          -         |       Online       |
  ---------------------------------------------------------------------------
    x  : Resource is disabled
       : Has been restarted less than 24 hours ago
       : STATUS and TARGET are different

      Listener    |      Port      |     hostname1      |      hostname2     |     Type     |
  ------------------------------------------------------------------------------------------
   ASMNET1LSNR_ASM| TCP:1526       |       Online       |       Online       |   Listener   |
   LISTENER       | TCP:1521,1525  |       Online       |       Online       |   Listener   |
   LISTENER_SCAN1 | TCP:1521,1525  |       Online       |          -         |     SCAN     |
   LISTENER_SCAN2 | TCP:1521,1525  |          -         |       Online       |     SCAN     |
   LISTENER_SCAN3 | TCP:1521,1525  |          -         |       Online       |     SCAN     |
  ------------------------------------------------------------------------------------------
       : Has been restarted less than 24 hours ago

         DB       |     Version    |      hostname1     |      hostname2     |    DB Type   |
  ------------------------------------------------------------------------------------------
   mgm            |            (2) |        Open        |         -          |  MGMTDB (P)  |
   prod           | 12.1.0     (1) |        Open        |        Open        |    RAC (P)   |
  ------------------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column ("''" means "same as above")

         1 : /u01/app/oracle/product/12.1.0/db_1        oracle oinstall
         2 : %CRS_HOME%                                 grid      ''

       : Has been restarted less than 24 hours ago
       : STATUS and TARGET are different

[grid@hostname1 antunez]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [441346801].

Hopefully this blog post saves you from a few headaches and working long hours over night if you ever hit these 2 bugs while doing an OOP for your 19.x GI.

Comments

comments

Tags:
,
Rene Antunez
antunez.rene@gmail.com
No Comments

Leave a Reply