24 Nov How I Finished a GI OOP Patching From 19.6 to 19.8 After Facing cluutil: No Such File or Directory and clsrsc-740 Errors
This past weekend I was doing a production grid infrastructure(GI) Out Of Place Patching (OOP) from 19.6 to 19.8 for a client and while doing this exercise, I hit several bugs along the way (Bugs 20785766 and 27554103).
This blog post is how I solved them and hope it saves you a lot of time if you ever face this issue.
As I have already blogged in the past on how to do a GI OOP, I won’t go into several details of this process, I will just mention what I was doing.
I did the switchGridHome from 19.6 to 19.8 without any issues and ran successfully root.sh in node1
[grid@hostname1 grid]$ ./gridSetup.sh -switchGridHome -silent Launching Oracle Grid Infrastructure Setup Wizard... You can find the log of this install session at: /u01/app/oraInventory/logs/cloneActions2020-11-20_09-10-17PM.log As a root user, execute the following script(s): 1. /u01/app/19.8.0.0/grid/root.sh Execute /u01/app/19.8.0.0/grid/root.sh on the following nodes: [hostname1, hostname2] Run the scripts on the local node first. After successful completion, run the scripts in sequence on all other nodes. Successfully Setup Software. ... [root@hostname1 ~]# /u01/app/19.8.0.0/grid/root.sh Check /u01/app/19.8.0.0/grid/install/root_oracle-db01-s01_2020-11-20_21-13-24-032842094.log for the output of root script
When I ran the root.sh in node2, I ran into the error The CRS executable file ‘clsecho’ does not exist, so I went and checked and indeed the file didn’t exist in GI_HOME/bin. Doing a check between node1 and node2, there was a difference of about 100 files for this directory.
[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script [root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log 2020-11-20 21:42:27: The 'ROOTCRS_PREPATCH' is either in START/FAILED state 2020-11-20 21:42:27: The CRS executable file /u01/app/19.8.0.0/grid/bin/cluutil either does not exist or is not executable 2020-11-20 21:42:27: Invoking "/u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status" 2020-11-20 21:42:27: trace file=/u01/app/oracle/crsdata/hostname2/crsconfig/cluutil3.log 2020-11-20 21:42:27: Running as user grid: /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status 2020-11-20 21:42:27: Removing file /tmp/X9bxqSWx3c 2020-11-20 21:42:27: Successfully removed file: /tmp/X9bxqSWx3c 2020-11-20 21:42:27: pipe exit code: 32512 2020-11-20 21:42:27: /bin/su exited with rc=127 2020-11-20 21:42:27: bash: /u01/app/19.8.0.0/grid/bin/cluutil: No such file or directory 2020-11-20 21:42:27: The CRS executable file /u01/app/19.8.0.0/grid/bin/clsecho either does not exist or is not executable 2020-11-20 21:42:27: The CRS executable file 'clsecho' does not exist. 2020-11-20 21:42:27: ###### Begin DIE Stack Trace ###### 2020-11-20 21:42:27: Package File Line Calling 2020-11-20 21:42:27: --------------- -------------------- ---- ---------- 2020-11-20 21:42:27: 1: main rootcrs.pl 357 crsutils::dietrap 2020-11-20 21:42:27: 2: crspatch crspatch.pm 2815 main::__ANON__ 2020-11-20 21:42:27: 3: crspatch crspatch.pm 2203 crspatch::postPatchRerunCheck 2020-11-20 21:42:27: 4: crspatch crspatch.pm 2015 crspatch::crsPostPatchCkpts 2020-11-20 21:42:27: 5: crspatch crspatch.pm 394 crspatch::crsPostPatch 2020-11-20 21:42:27: 6: main rootcrs.pl 370 crspatch::new 2020-11-20 21:42:27: ####### End DIE Stack Trace ####### 2020-11-20 21:42:27: checkpoint has failed ######################################################################## ## Difference of Number of files between node1 and node2 ######################################################################## [root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l 405 [root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l 303
The first thing I did after the failure, was go and check the status of the cluster with Fred Dennis’s script rac_status. I found that everything was up and the crs status was in ROLLING PATCH mode. I saw that the crs was running with the 19.8 version in node1 and with the 19.6 version in node2 .
[grid@hostname1 antunez]$ ./rac_status.sh -a Cluster rene-ace-cluster Type | Name | hostname1 | hostname2 | --------------------------------------------------------------------------- MGMTLSNR | MGMTLSNR | Online | - | asm | asm | Online | Online | asmnetwork | asmnet1 | Online | Online | chad | chad | Online | Online | cvu | cvu | - | Online | dg | ORAARCH | Online | Online | dg | ORACRS | Online | Online | dg | ORADATA | Online | Online | dg | ORAFLASHBACK | Online | Online | dg | ORAREDO | Online | Online | network | net1 | Online | Online | ons | ons | Online | Online | qosmserver | qosmserver | - | Online | vip | hostname1 | Online | - | vip | hostname2 | - | Online | vip | scan1 | Online | - | vip | scan2 | - | Online | vip | scan3 | - | Online | --------------------------------------------------------------------------- x : Resource is disabled : Has been restarted less than 24 hours ago : STATUS and TARGET are different Listener | Port | hostname1 | hostname2 | Type | ------------------------------------------------------------------------------------------ ASMNET1LSNR_ASM| TCP:1526 | Online | Online | Listener | LISTENER | TCP:1521,1525 | Online | Online | Listener | LISTENER_SCAN1 | TCP:1521,1525 | Online | - | SCAN | LISTENER_SCAN2 | TCP:1521,1525 | - | Online | SCAN | LISTENER_SCAN3 | TCP:1521,1525 | - | Online | SCAN | ------------------------------------------------------------------------------------------ : Has been restarted less than 24 hours ago DB | Version | hostname1 | hostname2 | DB Type | ------------------------------------------------------------------------------------------ mgm | (2) | Open | - | MGMTDB (P) | prod | 12.1.0 (1) | Open | Open | RAC (P) | ------------------------------------------------------------------------------------------ ORACLE_HOME references listed in the Version column ("''" means "same as above") 1 : /u01/app/oracle/product/12.1.0/db_1 oracle oinstall 2 : %CRS_HOME% grid '' : Has been restarted less than 24 hours ago : STATUS and TARGET are different [grid@hostname1 antunez]$ crsctl query crs activeversion -f Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2701864972].
I found MOS note Grid Infrastructure root script (root.sh etc) fails as remote node missing binaries (Doc ID 1991928.1) that there is a bug (20785766) in the GI installer in 12.1 for files missing in the GI_HOME/bin and/or GI_HOME/lib. Even though the document mentions 12.1, I hit it with the 19.8 version and it applied to my issue, so I did what the note says which is:
the workaround is to manually copy missing files from the node where installer was started and re-run root script
I excluded the soft link lbuilder as that was already created in the second node and I changed of ownership to root:oinstall to the GI_HOME/bin files in node2.
######################################################################## ## From node2 ######################################################################## [root@hostname2 bin]# ls -al | grep "lbuilder" lrwxrwxrwx. 1 grid oinstall 24 Nov 20 21:10 lbuilder -> ../nls/lbuilder/lbuilder ######################################################################## ## From node1 ######################################################################## [root@hostname1 ~]$ cd /u01/app/19.8.0.0/grid/bin [root@hostname1 ~]$ find . ! -name "lbuilder" | xargs -i scp {} hostname2:/u01/app/19.8.0.0/grid/bin ######################################################################## ## Difference of Number of files between node1 and node2 ######################################################################## [root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l 405 [root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l 405 ######################################################################## ## Changed the ownership to root:oinstall in hostname2 ######################################################################## [root@hostname2 ~]$ cd /u01/app/19.8.0.0/grid/bin [root@hostname2 bin]$ chown root:oinstall ./*
Now that I had copied the files, I did a relink of the GI_HOME in node2, using this documentation note, as the sticky bits were lost with the scp.
A few notes on the relink in this situation
- As the active GI binaries in node 2 were still from the 19.6 GI_HOME, I didn’t need to run rootcrs.sh -unlock
- I didn’t run rootadd_rdbms.sh, as this runs as part of the /u01/app/19.8.0.0/grid/root.sh that I was going to rerun after the fix above.
- Similar to point 1, I didn’t run rootcrs.sh -lock
[grid@hostname2 ~]$ export ORACLE_HOME=/u01/app/19.8.0.0/grid [grid@hostname2 ~]$ $ORACLE_HOME/bin/relink
After the relink, I reran in node 2 the /u01/app/19.8.0.0/grid/root.sh, but now I got a new error CLSRSC-740: inconsistent options specified to the postpatch command.
[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh Check /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log for the output of root script [root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log 2020-11-20 23:39:28: NONROLLING=0 2020-11-20 23:39:28: Succeeded to get property value:NONROLLING=0 2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740 2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740 2020-11-20 23:39:28: Command output: > CLSRSC-740: inconsistent options specified to the postpatch command >End Command output 2020-11-20 23:39:28: CLSRSC-740: inconsistent options specified to the postpatch command 2020-11-20 23:39:28: ###### Begin DIE Stack Trace ###### 2020-11-20 23:39:28: Package File Line Calling 2020-11-20 23:39:28: --------------- -------------------- ---- ---------- 2020-11-20 23:39:28: 1: main rootcrs.pl 357 crsutils::dietrap 2020-11-20 23:39:28: 2: crspatch crspatch.pm 2212 main::__ANON__ 2020-11-20 23:39:28: 3: crspatch crspatch.pm 2015 crspatch::crsPostPatchCkpts 2020-11-20 23:39:28: 4: crspatch crspatch.pm 394 crspatch::crsPostPatch 2020-11-20 23:39:28: 5: main rootcrs.pl 370 crspatch::new 2020-11-20 23:39:28: ####### End DIE Stack Trace ####### 2020-11-20 23:39:28: checkpoint has failed
After investigation I saw that the checkpoint ROOTCRS_PREPATCH status was marked as successful from the previous failed run of the root.sh command.
[grid@hostname2 ~]$ /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status SUCCESS
Continuing to investigate found that this error was part of bug 27554103. I solved this error by changing the checkpoint ROOTCRS_PREPATCH to the status “start” and rerunning the /u01/app/19.8.0.0/grid/root.sh in node2.
[root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_PREPATCH -state START [root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status START [root@hostname2 ~]# /u01/app/19.8.0.0/grid/root.sh Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script
After doing the steps above, I saw that everything was now as it should be in both nodes and the cluster upgrade state was in NORMAL state.
[grid@hostname1 antunez]$ ./rac_status.sh -a Cluster rene-ace-cluster Type | Name | hostname1 | hostname2 | --------------------------------------------------------------------------- MGMTLSNR | MGMTLSNR | Online | - | asm | asm | Online | Online | asmnetwork | asmnet1 | Online | Online | chad | chad | Online | Online | cvu | cvu | - | Online | dg | ORAARCH | Online | Online | dg | ORACRS | Online | Online | dg | ORADATA | Online | Online | dg | ORAFLASHBACK | Online | Online | dg | ORAREDO | Online | Online | network | net1 | Online | Online | ons | ons | Online | Online | qosmserver | qosmserver | - | Online | vip | hostname1 | Online | - | vip | hostname2 | - | Online | vip | scan1 | Online | - | vip | scan2 | - | Online | vip | scan3 | - | Online | --------------------------------------------------------------------------- x : Resource is disabled : Has been restarted less than 24 hours ago : STATUS and TARGET are different Listener | Port | hostname1 | hostname2 | Type | ------------------------------------------------------------------------------------------ ASMNET1LSNR_ASM| TCP:1526 | Online | Online | Listener | LISTENER | TCP:1521,1525 | Online | Online | Listener | LISTENER_SCAN1 | TCP:1521,1525 | Online | - | SCAN | LISTENER_SCAN2 | TCP:1521,1525 | - | Online | SCAN | LISTENER_SCAN3 | TCP:1521,1525 | - | Online | SCAN | ------------------------------------------------------------------------------------------ : Has been restarted less than 24 hours ago DB | Version | hostname1 | hostname2 | DB Type | ------------------------------------------------------------------------------------------ mgm | (2) | Open | - | MGMTDB (P) | prod | 12.1.0 (1) | Open | Open | RAC (P) | ------------------------------------------------------------------------------------------ ORACLE_HOME references listed in the Version column ("''" means "same as above") 1 : /u01/app/oracle/product/12.1.0/db_1 oracle oinstall 2 : %CRS_HOME% grid '' : Has been restarted less than 24 hours ago : STATUS and TARGET are different [grid@hostname1 antunez]$ crsctl query crs activeversion -f Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [441346801].
Hopefully this blog post saves you from a few headaches and working long hours over night if you ever hit these 2 bugs while doing an OOP for your 19.x GI.
Sorry, the comment form is closed at this time.