Troubleshooting Data Guard Implementation

Part I

Database Alert Logs
The database alert logs provide a great source of information in investigating issues with both the
primary and standby databases. The Data Guard Broker also writes information to the alert logs as well as its own Disaster Recovery Center (DRC) log files.

Observer Log Files
If you are using the Fast-Start Failover (FSFO) feature of Data Guard, additional diagnostic
information is available in the observer log file. The log files are called dgmgrl<nnnnn>.log. The <nnnnn> is a system-generated number that will change each time the observer is started.

Data Guard Trace Files
Data Guard trace files can also provide crucial information to diagnosing Data Guard configuration
issues and/or identify areas in the Data Guard topology that need attention. Oracle will write an
audit trail of the archived redo logs received from the primary database into a trace file when the
LOG_ARCHIVE_TRACE initialization parameter is enabled.

Data Guard Broker Log Files and Tools
The Data Guard Monitor (DMON) process also writes status information to a log file. DMON log
files can be useful in diagnosing Data Guard failures and can be found in the background_dump_
dest directory.

Dynamic Performance Views
Name Description
DBA_LOGSTDBY_EVENTS Contains the last 100 (default) events that occurred on the
logical standby
DBA_LOGSTDBY_PROGRESS Checks whether SQL Apply is progressing
DBA_LOGSTDBY_LOG Checks whether archive logs are being delivered to a logical
standby
DBA_LOGSTDBY_UNSUPPORTED Identifies SQL Apply unsupported data types
V$ARCHIVE_DEST Describes all the destinations in the Data Guard configuration,
including each destination’s current settings
V$ARCHIVE_DEST_STATUS Displays runtime and configuration information for the redo
transport destinations
V$ARCHIVE_GAP Displays information to help you identify a gap in the
archived redo log files on a physical standby
V$DATAGUARD_CONFIG Lists the DB_UNIQUE_NAME parameters defined in the Data
Guard configuration in LOG_ARCHIVE_CONFIG
V$DATAGUARD_STATUS Displays and records events that would typically be triggered
by any message to the alert log or server process trace files
limited to the last 256 messages
V$LOG Displays information from the online redo log files
V$LOGFILE Contains information about the online redo log files and
standby redo log files
V$LOG_HISTORY
V$ARCHIVED_LOG
Contains archive log history information from the control file
Contains more detailed archived log information from the
control file.
V$LOGSTDBY_PROCESS Shows whether logical standby process is running; if query
returns no rows, it is not running
V$MANAGED_STANDBY Displays current status information for Oracle Database
processes related to Data Guard
V$STANDBY_LOG Contains standby log file information
Data Guard Configuration and Management Errors

Common Management Issues

The Password File
You may encounter a couple of errors associated with the password file. For instance, you may
have problems connecting to the standby database and may see the “ORA-01034: ORACLE not
available” error message. Check your password for the SYS account on the primary and standby
databases to make sure that they are the same.
If you encounter an ORA-16191 error when the primary attempts to connect to the standby,
verify that the SYS password is the same on the primary and standby databases. Ensure that the
primary and standby databases are using a password file and that the REMOTE_LOGIN_PASSWORD
parameter is set to SHARED or EXCLUSIVE.
SQL Apply Fails with ORA-01031
This error tells you that SQL Apply has failed with insufficient privileges. The problem is that
background processes are not running with sysdba privileges. They will need to be granted
manually on both the primary and standby databases. You will see the following in the alert log
on the logical standby:
LOGSTDBY stmt: grant sysdba to maxtrix_user
LOGSTDBY status: ORA-01031: insufficient privileges

Resolving Gaps Manually
A redo gap occurs whenever redo transmission is interrupted. Redo Transport Services will
automatically detect the redo gap and resolve it by sending the missing redo to the destination
once redo transmission resumes. In some cases, though, gap resolution can’t be performed
automatically and must be done manually.
To determine whether a redo gap exists on your physical standby, you can run the following
query.
SQL> SELECT * FROM V$ARCHIVE_GAP;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
———– ————- ————–
1 6233 6233
1 4531 4531
1 4938 4939
You can use the low and high sequence numbers to identify the actual log files that are missing. Run
the following query on the primary to capture this information:
SQL> SELECT name
FROM v$archived_log
WHERE thread# = 2

AND dest_id = 2
AND sequence# = 6233;
NAME
———————————————–
/u06/oradata/MATRIX/recovery_area/arch_2_6233_656445787.arc

Use an Incremental RMAN Backup to Roll a Standby Forward
In cases where the standby has fallen behind for a long period of time, you may want to use an
incremental backup as a more efficient method for handling large gap scenarios. You should use
this method only if the standby has fallen significantly behind the primary or no NOLOGGING
operations have been performed on the primary. This solution can be used only on a physical
standby database.

Standby Database Is Not Receiving Redo Logs
If the redo data is not being transported to the standby database, query the V$ARCHIVE_DEST
view and check for error messages. Here is an example of the query:
SQL> SELECT DEST_ID, STATUS, ERROR
FROM V$ARCHIVE_DEST;
DEST_ID STATUS ERROR
——- ——— ————————————–
1 ERROR ORA-16012: Archivelog standby database identifier mismatch
2 INACTIVE
3 INACTIVE
If the output of the query does not help you, check the following list of possible issues. If any
of the following conditions exists, log transport services will fail to transmit redo data to the
standby database.

Listener Issues
■ The listener.ora file has not been configured correctly.
■ The listener has not been started on the standby.
■ The service name for the standby instance is not configured correctly in the tnsnames.ora
file on the primary database.
■ If you cannot log in remotely to the standby database using the TNSNAME identifier,
then Data Guard cannot log in either. Try the remote login yourself:
SQL> CONNECT SYS/PASSWORD@MYTNSNAME AS SYSDBA;

Archive Destination Issues
■ The service name specified by the LOG_ARCHIVE_DEST_n parameter for the primary
database is incorrect.
■ The LOG_ARCHIVE_DEST_STATE_n parameter for the standby database is not set to the
value ENABLE.
■ A disconnect occurred and the number of seconds specified by REOPEN has not yet passed.
■ The standby instance is not started.
■ The standby control file was created incorrectly.
■ The correct backup was not used to build the standby.
Standby Waiting on Log Files that Exist at the Standby
If the standby is waiting on logs that are in the standby destination area, the logs are most likely
not properly registered on the standby. Check the appropriate views (physical is V$ARCHIVED_
LOG and logical is DBA_LOGSTDBY_LOG) to see if they are registered. If they are not registered,
the file on disk is unusable and you should use the manual procedure if Data Guard cannot
resolve the gaps automatically. Do not try to use the file on disk! Also verify that the local
archival destination on the standby is correct.
Receive an ORA-16032 on Alter System Archive Log All
One of the LOG_ARCHIVE_DEST_n parameters is not configured correctly. Verify the local
destinations on the primary.

Media Recovery Failures
If media recovery fails on the standby, leaving your standby in an unrecoverable state, you will
see the following messages:
ORA-01578: ORACLE data block corrupted
ORA-26040: Data block was loaded using the NOLOGGING option.
On the physical standby database, you will need to do the following:
1. Offline the corresponding datafiles.
2. Copy the backup datafiles from the primary database.
3. Replace the corrupted datafiles.
4. Stop the Redo Apply.
5. Online the corresponding datafiles.
6. Restart the Redo Apply.
Resync Logical Standby Database When Nologging Issue Occurs Keep in mind that SQL
Apply will skip over nologging operations. Therefore, the datafiles will not be affected by the data
block corruption. You will eventually see an “ORA-01403: No Data Found on the standby” error.
To resync the table with the primary table, you will need to re-create it with the instantiate table
procedure.
If you see an “ORA-00308: cannot open archived log,” cancel SQL Apply and manually
retrieve the missing files.
Renaming Datafiles with the ALTER DATABASE Statement
You cannot rename a datafile on the standby site when the STANDBY_FILE_MANAGEMENT
initialization parameter is set to AUTO. In addition, you are not permitted to use any of the
following SQL statements:
SQL> ALTER DATABASE RENAME
SQL> ALTER DATABASE ADD/DROP LOGFILE
SQL> ALTER DATABASE ADD/DROP STANDBY LOGFILE MEMBER
SQL> ALTER DATABASE CREATE DATAFILE AS
If you attempt to use any of these statements on the standby database, the following error is
returned if STANDBY_FILE_MANAGEMENT is set to AUTO:
ORA-01511: error in renaming log/data files
ORA-01270: RENAME operation is not allowed
You can still add and delete standby redo log files:
SQL> ALTER DATABASE ADD/DROP STANDBY LOGFILE;

Part II

Physical Standby Issues
You will likely encounter several common issues when managing a physical database:
■ You cannot mount the physical standby database.
■ The standby archive destination is not defined properly.
■ The standby site does not receive logs.
■ The standby site is not processing the logs (MRP down, and so on).

Unable to Mount the Physical Standby Database
The physical standby can’t be mounted if the control file was created with an operating system–
created backup or a backup created using an ALTER DATABASE statement without the STANDBY
options. The standby control file must be created with the ALTER DATABASE CREATE STANDBY
CONTROLFILE statement or the RMAN BACKUP CURRENT CONTROLFILE with the FOR STANDBY
option.

Primary Database Shutdown
If you have configured standby redo log files on your standby database, the size of the current
standby redo log file on each standby database must be the same size as the redo log file on the
primary database. After a log switch, if no available standby redo log files match the size of the redo
log file on the primary database, the primary will shut down if it is in Maximum Protection mode.
The primary database becomes unsynchronized if it is in Maximum Availability mode. You will
see the following message in the alert log:
No standby log files of size <#> blocks available.

ORA-16066
You will encounter this error when the REMOTE_ARCHIVE_ENABLE parameter is set to FALSE.
This parameter controls whether the archival of the redo logs to remote destinations is permitted.

ORA-16204: Parameter %s Cannot Be Parsed
The value for the LOG_ARCHIVE_DEST_n parameter is incorrect.
Remote Archival to Standby Database Fails with an ORA-01031
You will see this error if an archiver process (ARCn) or LogWriter Network Service (LNS) process
at the primary database fails with the ORA-01031 and the redo is not getting transferred to the
standby. This is a result of a missing password file on the standby database. Copy the password
file from the primary to the standby database, and restart the standby database.

Standby Database Cannot Apply Redo
If you encounter issues with applying the logs to the standby database, and you see the ORA-
00326 message in your alert log, you’ll see the following:
ORA-00326: log begins at change <SCN> , need earlier change <SCN>
This means that media recovery has found an archive log which was generated after the required
archive log. It needs the correct log. When you are using Data Guard Redo Apply, this error cannot
occur since the managed recovery process (MRP) will provide media recovery only with the logs in
the correct order. If the next log is not in the correct order, the MRP will not pass anything to media
recovery. This usually occurs when Data Guard ends up performing gap resolution.
If you were using manual recovery and feeding media recovery archive logs one at a time, it
normally means that you provided an archive log generated after the required log. You can
remedy the problem by providing the correct file.

Log Shipment Errors—ORA-12570
You may encounter sporadic “ORA-12570: TNS packet reader failure” messages when shipping
redo data between the primary and standby when the connection is going through a firewall. If
you can’t disable the firewall timeout or bypass the firewall, set sqlnet.expire_time in the
sqlnet.ora file on both servers.

Logical Standby Database Failures
Several issues can occur with a logical standby environment:
■ SQL Apply stops.
■ SQL Apply hangs.
■ SQL Apply cannot keep up with the primary.
■ Data is not applied to the logical standby.

What to Do if SQL Apply Stops
Log apply services cannot apply unsupported Data Manipulation Language (DML) statements,
Data Definition Language (DDL) statements, and Oracle-supplied packages to a logical standby
database running SQL Apply. The following sections summarize some of the conditions under
which SQL Apply will stop.

Determine What Has Failed You can find the last statement that SQL Apply tried to process in
the DBA_LOGSTDBY_EVENTS view:
SQL> SELECT XIDUSN, XIDSLT, XIDSQN, STATUS, STATUS_CODE
2 FROM DBA_LOGSTDBY_EVENTS
3 WHERE EVENT_TIME =
4(SELECT MAX(EVENT_TIME)
5 FROM DBA_LOGSTDBY_EVENTS);
The output of the query will reveal the statement and error that caused SQL Apply to fail.

ORA-01403: No Data Found You will encounter the ORA-01403 error when DML is executed
on the logical standby to tables maintained by SQL Apply. This will generally occur if someone
makes these changes with SYSDBA privileges or if the logical standby guard was disabled. The
primary and the logical can get out of sync and the error won’t show up until the table on the
primary standby is updated.
To resolve the issue, you would need to skip and re-instantiate the table. Make sure that
you have a database link defined to connect to the primary database. You can use the dbms_
logstdby.skip and dbms_logstdby.instantiate_table procedures to accomplish this,
as demonstrated here:
SQL> EXEC DBMS_LOGSTDBY.SKIP(‘DML’,’ACTORS’,’%’);
SQL> EXECUTE DBMS_LOGSTDBY.INSTANTIATE_TABLE
2 (‘MATRIX_USER’, ‘ACTORS’, ‘MATRIX_DBLINK’);

ORA-16211: Unsupported Record Found in the Archived Redo Log You will encounter the
ORA-16211 error when log apply encounters a record in the archived redo log that could not be
interpreted. This error could occur under any number of circumstances; however, a few
documented potential causes for this error do exist:
■ Nologging on an object on the primary
■ Changes made to an Indexed Organized Table (IOT) on the primary
■ Direct path inserts on a partition table
To resolve this error, you can re-instantiate the object or drop the object. In either case, you
will want to determine the archive log with the error and send it to Oracle support for analysis.

Handling DDL Issues You will often face issues with DDL statements in a standby database.
For example, a DBA may add a datafile to the primary and the path is not valid on the logical
standby. Remember that the DB_FILE_NAME_CONVERT parameters are not used on a logical
standby. Keep in mind that the following statements are skipped automatically on the standby:
CREATE and ALTER DATABASE commands
CREATE and DROP DATABASE LINK commands
The following procedures can be used to help resolve issues with DDL:
DBMS_LOGSTDBY.SKIP Skips a schema or object or a type of statement
DBMS_LOGSTDBY.SKIP_ERROR Ignores a class of errors and continues the SQL Apply
DBMS_LOGSTDBY.SKIP_TRANSACTION Skips a failed transaction provided that you
issued a compensating transaction

SQL Apply Hanging
If it appears that SQL Apply is hanging, perform the following steps for investigative procedures:

1. Check DBA_LOGSTDBY_PROGRESS to see if any activity is occurring:
SQL> SELECT APPLIED_SCN, APPLIED_TIME, READ_SCN,
READ_TIME, NEWEST_SCN, NEWEST_TIME
FROM DBA_LOGSTDBY_PROGRESS;

2. Check the HIGH_SCN from the V$LOGSTDBY view. The SCN should change as the SQL
Apply progresses:
SQL> SELECT TYPE, HIGH_SCN, STATUS
FROM V$LOGSTDBY;

3. Check V$LOGSTDBY_STATS for information on the activity on the standby. You can
look at the number of transactions applied or transactions ready and tell whether the
transactions are being applied as fast as they are being read

(Src: http://www.reachdba.com)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s