ORA-00600 [kpdbHashTableIns:dup] When Creating PDB in 19c

ORA-00600 [kpdbHashTableIns:dup] When Creating PDB in 19c

This week I was involved in an issue with a QA pluggable database refresh that is critical to business, where after the PDB clone to the QA container caused an ORA-00600 [KPDBHASHTABLEINS:DUP] error and then the container crashed and it was not possible to start it, failing with the same error.

I will explore in this blog post how I brought the container up, though it goes to say that what I did should not be done in a production environment without the consent of Oracle Support.

The environment for this excercise is as below ( the names are redacted)

  • Source Container : CDBSTAGE
  • Target Container : CDBQA
  • PDB : TESTPDB
  • Oracle Binaries Version : 19.10

One of my teamates tried to do the unplug/clone/plug of the TESTPDB into the QA container with the following command and it failed with ORA-01210. 

SQL> create pluggable database TESTPDB as clone using '/oracle/dba/exports/refresh/cdbstage/CDBSTAGE_TESTDB_unplug.xml';
create pluggable database TESTPDB as clone using '/oracle/dba/exports/refresh/cdbstage/CDBSTAGE_TESTDB_unplug.xml'
*
ERROR at line 1:
ORA-01210: data file header is media corrupt
ORA-01110: data file : '+DATAC1/CDBQA/C67E90899BD0FE11E0530AFE18ACC0AF/DATAFILE/system.6379.1077225883'

This error was weird as prior of the unplug of TESTPDB in CDBSTAGE, the PDB was functioning fine and the data file was not corrupt.

After troubleshooting, we were able to find out that the diskgroup DATAC1 had ran out of space for this clone to take place. So we proceeded to free up space and we executed again the clone/plug of TESTPDB into the QA container.

This time around the command failed with ORA-00600 [KPDBHASHTABLEINS:DUP] and ORA-03113: end-of-file on communication channel.

SQL> create pluggable database TESTPDB as clone using '/oracle/dba/exports/refresh/cdbstage/CDBSTAGE_TESTDB_unplug.xml';
create pluggable database TESTPDB as clone using '/oracle/dba/exports/refresh/cdbstage/CDBSTAGE_TESTDB_unplug.xml'
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 250554
Session ID: 1685 Serial number: 13178
ORA-00600: internal error code, arguments: [kpdbHashTableIns:dup],
[0x350D00090], [0x350CFF330], [1], [], [], [], [], [], [], [], []

What happened right after the error above, was that CDBQA crashed. My next “logical” move was to try and restart CDBQA but it failed with the same error.

SQL> startup 
ORACLE instance started.

Total System Global Area  118255568 bytes
Fixed Size                   282576 bytes
Variable Size              83886080 bytes
Database Buffers           33554432 bytes
Redo Buffers                 532480 bytes
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 250554
Session ID: 1685 Serial number: 13178
ORA-00600: internal error code, arguments: [kpdbHashTableIns:dup],
[0x350D00090], [0x350CFF330], [1], [], [], [], [], [], [], [], []

Looking at the trace file generated, the following is what stands out: “Duplicate Element found in PDB Hash Table #1

[TOC00001]
ORA-00600: internal error code, arguments: [kpdbHashTableIns:dup], [0x38596A620], [0x3859698C0], [1], [], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 1640260 (ORA 600 [kpdbHashTableIns:dup]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
Duplicate Element found in PDB Hash Table #1
New Ele: lnk=0x38596a620 ele=0x38596a440, id=16, uid=2204782064, name=TESTPDB
Existing Ele: lnk=0x3859698c0 ele=0x3859696e0, id=12, uid=3303067252, name=TESTPDB
----- End of Customized Incident Dump(s) -----
[TOC00003-END]

What you see in the trace is that there is a duplicate PDB in the control file, one with ID=12 and the other one with ID=16.

The next thing to try was to do a dump of the control file and see what was in there, and I could do this as the CDB was “sort of” mounted. I say sort of, as when I tried to query a $ view I got the following error, but when I tried to mount it it would say that the CDB was already mounted.

The good thing is that it allowed me to dump the control file for me to troubleshoot this error.

SQL> select name,open_mode from v$database;
ERROR on line 1:

ORA-01507: database not mounted

SQL> alter database mount;
ERROR on line 1:

ORA-01100: database already mounted

SQL> oradebug setmypid
Statement processed.

SQL> oradebug dump controlf 4;
Statement processed.

Once I dumped the controlfile, I could see in the “PLUGGABLE DATABASE RECORDS” section that indeed there were 2 active records of this PDB and one inactive record (Previously dropped) in the controlfile.

You can know that a PDB is plugged in the container if the Pluggable DataBase record number is equal to the id number. If the id number is equal to 0, then it means that the PDB has been unplugged from the container.
You can reade more about this in a blog post I wrote several years ago going into more detail of the status of a PDB in the controlfile.

## The dumpfile is redacted##
...
***************************************************************************
PLUGGABLE DATABASE RECORDS
***************************************************************************
...
Pluggable DataBase record=12
  id=12
  dbid=3303067252
  uid=3303067252
  guid=C67EEA4D8C2BBC9AE0530AFE18ACDFFC
  name=TESTPDB
  first datafile link=0
  first tempfile link=0
  pdbinc=0, pdbrdi=0, status=0x00000003, flags=0x00001001
  incrcv scn scn: 0x0000000000000000, clnscn scn: 0x0000006246ab0909, crescn scn: 0x0000000000000000
  dbrls scn: 0x0000000000000000, dbrlc=0
  iscn scn: 0x0000000000000000, itime=0
  bscn scn: 0x0000000000000000, btime=0
  escn scn: 0x0000000000000000, etime=0
, rdba=0
...
Pluggable DataBase record=15
  id=0
  dbid=633098858
  uid=633098858
  guid=C67F1F588B49EA58E0530AFE18ACFD92
  name=TESTPDB
  first datafile link=0
  first tempfile link=0
  pdbinc=0, pdbrdi=0, status=0x00000001, flags=0x00000001
  incrcv scn scn: 0x0000000000000000, clnscn scn: 0x00000062470ad396, crescn scn: 0x00000062470ad396
  dbrls scn: 0x0000000000000000, dbrlc=0
  iscn scn: 0x0000000000000000, itime=0
  bscn scn: 0x0000000000000000, btime=0
  escn scn: 0x0000000000000000, etime=0
, rdba=4194824
Pluggable DataBase record=16
  id=16
  dbid=2204782064
  uid=2204782064
  guid=C67F83B94389D839E0530AFE18AC2D51
  name=TESTPDB
  first datafile link=0
  first tempfile link=0
  pdbinc=0, pdbrdi=0, status=0x00000003, flags=0x00001001
  incrcv scn scn: 0x0000000000000000, clnscn scn: 0x00000062476185ea, crescn scn: 0x0000000000000000
  dbrls scn: 0x0000000000000000, dbrlc=0
  iscn scn: 0x0000000000000000, itime=0
  bscn scn: 0x0000000000000000, btime=0
  escn scn: 0x0000000000000000, etime=0
, rdba=0

Searching in MOS I found the following Bug 28606598 – ORA-600: [KPDBHASHTABLEINS:DUP] (Doc ID 28606598.8) and it mentions the below which it is not fixed in the version we are on.

Affects:

Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 19.1 but BELOW 20.1
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

The fix for 28606598 is first included in

Description

This bug is only relevant when using Real Application Clusters (RAC)
Rediscovery Notes
ORA-600 [KPDBHASHTABLEINS:DUP]
 
This error might occur if any concurrent PDB instance is started on another
instance while the PDB is being dropped or after dropped from the current instance.
 
This error is applicable only when RAC is involved.
 
Workaround
NONE

With the information above, it either meant that we were going to restore from backup this QA container, or recreate the container and both were going to be a painful experience due to the amount of time it woud take to do this.

As in my mind I had already lost the CDB, I decided to try and dump the controlfile to trace and recreate it to remove the duplicate record of the TESTPDB. I also created a pfile so that I could modify the location of the controlfile, as I wanted to keep the original one.

SQL> alter database backup controlfile to trace;

SQL> create pfile='/tmp/pfileCDBQA.ora' from spfile  

Once I had modified the pfile to reflect the new controlfile and modified the trace to a file called control.sql, I proceeded to recreate the controlfile with the NORESETLOGS section and open the container.

SQL> !cat control.sql
CREATE CONTROLFILE REUSE DATABASE "CDBQA" NORESETLOGS FORCE LOGGING ARCHIVELOG
    MAXLOGFILES 1024
    MAXLOGMEMBERS 5
    MAXDATAFILES 32767
    MAXINSTANCES 32
    MAXLOGHISTORY 33012
...

SQL> startup nomount='/tmp/pfileCDBQA.ora'
ORACLE instance started.

Total System Global Area  118255568 bytes
Fixed Size                   282576 bytes
Variable Size              83886080 bytes
Database Buffers           33554432 bytes
Redo Buffers                 532480 bytes

SQL> @control.sql

Control file created.


SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> alter database open;

Database altered.

I was a happy person, for about 5 mins! As right after the DB was up, it crashed again after a few mins of being up, but this time with a different error. 

PDBRENE(7):Errors in file /u02/app/oracle/diag/rdbms/cdbqa/CDBQA1/trace/CDBQA1_smon_157742.trc:
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
2021-07-07T02:00:45.273928-03:00
Dumping diagnostic data in directory=[cdmp_20210707020045], requested by (instance=1, osid=157742 (SMON)), summary=[incident=1754815].
2021-07-07T02:00:46.659053-03:00
Errors in file /u02/app/oracle/diag/rdbms/cdbqa/CDBQA1/trace/CDBQA1_smon_157742.trc (incident=1754816) (PDBNAME=PDBRENE):
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
PDBRENE(7):Incident details in: /u02/app/oracle/diag/rdbms/cdbqa/CDBQA1/incident/incdir_1754816/CDBQA1_smon_157742_i1754816.trc
PDBRENE(7):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2021-07-07T02:00:46.685181-03:00

I restarted the DB and it came crashing down again after 10 mins of being up, so it was troubleshooting time again.

Searching for this new error, I found the below

ERROR:

Format: ORA-600 [ktubko_1]
This error normally means that a transaction recovery (by SMON), transaction rollback (by a user process), is trying to rollback a transaction in a datafile block but the transaction is not present, no longer active, in the transaction list (ITL) of that block. Oracle is trying to apply an undo record of that transaction in a block but the transaction is not there.

The problem is probably caused by lost write / lost changes

I tried to search for pending transactions in the PDB (PDBRENE) mentioned in the log/trace and found none, so I searched for the object mentioned in the trace file and found that it was an index, so what I did was get the DDL and recreate it.

SQL> Select u.inst#            instid    ,
       u.name             useg      ,
       u.status$          status    ,
       x.ktuxeusn         usn       ,
       x.ktuxeslt         slt       ,
       x.ktuxesqn         wrp       ,
       x.ktuxesiz         undoblocks
From   x$ktuxe            x,
       undo$              u
Where  x.ktuxeusn = u.us#
And    x.ktuxesta = 'ACTIVE'
And    x.ktuxecfl like 'DEAD%' ;

no rows selected.

SQL > select owner, object_name, object_type, status
from dba_objects where object_id =842690;

OWNER OBJECT_NAME OBJECT_TYPE Status
------------------------------ ------------------------------ ------------------------------ --------
SYS I_MVREF$_RUN_STATS INDEX VALID

1 row selected.

SQL> SET LONG 2000000 PAGESIZE 0 feedback on verify off
column x format a200 WORD_WRAPPED
execute dbms_metadata.set_transform_param (DBMS_METADATA.SESSION_TRANSFORM,'SQLTERMINATOR',TRUE);
select DBMS_METADATA.GET_DDL('INDEX',u.object_name,u.owner) x
from dba_objects u
where object_type = 'INDEX'
and object_name='&OBJECT_NAME'

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09

Enter value for object_name: I_MVREF$_RUN_STATS
Enter value for owner: SYS

CREATE INDEX "SYS"."I_MVREF$_RUN_STATS" ON "SYS"."MVREF$_RUN_STATS" ("RUN_OWNER_USER#", "REFRESH_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

1 row selected.

Elapsed: 00:00:00.85

SQL> DROP INDEX "SYS"."I_MVREF$_RUN_STATS";

Index dropped.

SQL> CREATE INDEX "SYS"."I_MVREF$_RUN_STATS" ON "SYS"."MVREF$_RUN_STATS" ("RUN_OWNER_USER#", "REFRESH_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

Index created.

Finally after recreating the index, I bounced the database and was able to open the DB successfully and without it coming crashing down.

SQL> shutdown immediate

SQL> startup 
ORACLE instance started.

Total System Global Area  118255568 bytes
Fixed Size                   282576 bytes
Variable Size              83886080 bytes
Database Buffers           33554432 bytes
Redo Buffers                 532480 bytes
Database mounted.
Database opened.

Hope this blog post helps you if you are ever in a situation like mine with this bug. But again I highly recommend that this work you do it in conjunction with Oracle Support as none of what I did is vetted by Oracle.

Comments

comments

Tags:
Rene Antunez
antunez.rene@gmail.com
No Comments

Leave a Reply