Oracle Data pump utility enables very fast bulk data and metadata movement between Oracle databases. It provides parallel Export and Import utilities (expdp and impdp) and Web-based Oracle Enterprise Manager Interface.
- Data Pump Export and Import utilities are typically much faster than the original Export and Import Utilities. A single thread of Data Pump Export is about twice as fast as original Export, while Data Pump Import is 15-45 times fast than original Import.
- Data Pump jobs can be restarted without loss of data, whether or not the stoppage was voluntary or involuntary.
- Data Pump jobs support fine-grained object selection. Virtually any type of object can be included or excluded in a Data Pump job.
- Data Pump supports the ability to load one instance directly from another (network import) and unload a remote instance (network export).
DATA PUMP ARCHITECTURE
Data Pump Components
Data Pump provides four client utilities that execute the procedures provided in the DBMS_DATA_PUMP PL/SQL package. The client utilities accept input in the form of parameters which are passed to the DBMS_DATA_PUMP PL/SQL package to enable the exporting and importing of metadata and data into and out of an Oracle database.
A brief overview of the four client utilities is provided below:
- EXPDP – Client interface to the Data Pump Export utility. The interface closely resembles its predecessor, Oracle Export.
- IMPDP – Client interface to the Data Pump Import utility. The interface closely resembles its predecessor, Oracle Import.
- Grid Control Interface – This is a web based interface, which provides access to the DBMS_DATA_PUMP PL/SQL package.
- Custom Interface – Allows for the movement of complex data mining models.
Other Data pump components are described below.
Data Access Methods
Data Pump supports two access methods provided by Oracle Database 10g to load and unload data:
Direct Path and External Tables.
The Direct Path access method is the faster of the two, but does not support intrapartition parallelism. The External Tables access method does support this function, and therefore may be chosen to load or unload a very large table or partition
For example, a table being loaded with active referential constraints or global indexes must be loaded using the External Tables access method. A table with a column of data type LONG must be unloaded and loaded with the Direct Path access method. In most cases, we need not be concerned about choosing an access method; the Data Pump job will make the correct choice based on an array of job characteristics. Both methods write to the dumpfile set in a compact, binary stream format that is approximately 15 percent smaller than the original exp data representation.
Meta data API
Data Pump uses the DBMS_METADATA package to load and unload metadata. The Metadata API stores the metadata as XML documents. This differs from the original Export utility which stored the metadata as DDL. The Data Pump Import utility takes advantage of the flexibility provided by the XML format by allowing users to change object ownership, storage characteristics and tablespace destinations.
There are various processes that comprise a Data Pump job. They are described in the order of their creation.
Client Process – This is the process that makes calls to the Data Pump API. Oracle Database10g ships four client utilities of this API. This paper will discuss only the new export/import clients, expdp and impdp. Data Pump is integrated into Oracle Database 10g, a client is not required once a job is underway. Multiple clients may attach and detach from a job as necessary for monitoring and control.
Shadow Process – This is the standard Oracle shadow (or foreground) process created when a client logs in to Oracle Database. Upon receipt of a DBMS_DATAPUMP.OPEN request, the shadow process creates the job, which consists primarily of creating the master table, the Advanced Queuing (AQ) queues used for communication among the various processes, and the master Control Process. Once a job is running, the main task of the shadow process consists of servicing GET_STATUS requests from the client. If the client detaches, the shadow process also goes away.
Master Control Process (MCP) – As the name implies, the MCP controls the execution and sequencing of a Data Pump job. There is one MCP per Data Pump job, maintaining the job state, job description, restart, and dumpfile information in the master table. A job is divided into various phases of metadata and data unloading or loading, and the MCP hands out work requests to the worker processes appropriate for the current phase.
Worker Process – Upon receipt of a START_JOB request, the MCP creates a number of worker processes according to the value of the PARALLEL parameter. The worker processes perform the tasks requested by the MCP (primarily unloading and loading of metadata and data), and maintain the object rows that make up the bulk of the master table. As database objects are unloaded or loaded, these rows are written and updated with the current status of these objects: pending, completed, failed, and so on. The worker processes also maintain type completion rows, which describe the type of object currently being worked on: tables, indexes, views, and so on
Parallel Query (PQ) Process – If the External Tables data access method is chosen for loading or unloading a table or partition, some parallel query processes are created by the worker process that was given the load or unload assignment, and the worker process then acts as the query coordinator. These are standard parallel execution slaves that exploit the parallel execution architecture of Oracle Database, and enable intrapartition loading and unloading. In Real Application Clusters (RAC), parallel query processes may be created on an instance other than that on which the Data Pump job was initiated.
Oracle takes advantage of its Advanced Queuing (AQ) feature to provide communication paths between the different Data Pump processes. The Data Pump master control process writes work progress and error messages to the status queue. Shadow processes subscribe to this queue to retrieve the status information.
The master table contains information about the details of the current Data Pump operation being performed. the master table also records the parameters supplied by the calling program, worker status information and output file information. The table provides all of the information required to restart a Data Pump job that has stopped because of a planned, or unplanned, failure.
Command and Control Queue
The command and control queue provides an avenue of communications between the master control process and worker processes. All of the work requests created by the master process and the associated responses are passed through the command and control queue.
Dump Data file
Data Pump dump files are created by EXPDP and used as input by IMPDP. Since their format is incompatible with their predecessors, Export and Import, the utilities are not interchangeable. Import can only read a file created by Export and IMPDP can only read a file created by EXPDP.
The dumpfile contains data and/or metadata and the master table which is written to the dumpfile at the end of each EXPDP operation and read at the beginning of each IMPDP operation.
EXPORT/IMPORT USING DATA PUMP
Data Pump consists of two components: the Data Pump export utility called “expdp,” to Export the objects from a database, and the Data Pump Import utility called “impdp,” to load the objects into database.
The expdp and impdp tools support all the original exp and imp functionalities plus many new features. With previous release, we could only move the transportable tablespace across Oracle databases that were running on the same architecture and operating system. With Data Pump, you are able to transport data files from one platform to another.
Data Pump is a server-based utility, rather than client-based; dump files, log files, and SQL files are accessed relative to server-based directory paths.
Data Pump runs as a job inside the database, rather than as a stand-alone client application. This means that jobs are somewhat independent of the process that started the export or import. One machine (say a scheduled job) could start the export, while another machine (such as a DBA’s laptop) can check the status of the job. Since the job is inside the database, if you want to export to a file, the first thing that we must do is create a database DIRECTORY object for the output directory, and grant access to users who will be doing exports and imports.
New Concepts in Oracle Data Pump
There are two new concepts in Oracle Data Pump that are different from original Export and Import.
Data Pump differs from original Export and Import in that all jobs run primarily on the server using server processes. These server processes access files for the Data Pump jobs using directory objects that identify the location of the files.
Interactive Command-Line Mode
Besides regular operating system command-line mode, there is now a very powerful interactive command-line mode which allows the user to monitor and control Data Pump Export and Import operations.
Changing From Original Export/Import To Oracle Data Pump
Creating Directory Objects
In order to use Data Pump, the database administrator must create a directory object and grant privileges to the user on that directory object. If a directory object is not specified, a default directory object called data_pump_dir is provided. The default Data_pump_dir is available only to privileged users unless access is granted by the DBA.
In the following example, the following SQL statement creates a directory object named test_dir that is mapped to a directory located at ‘/u01/app/oracle/oradata/’;
We would login to SQL*Plus as system and enter the following SQL command to create a directory. Before creating the directory object we have to unlock the SCOTT account and create a directory object it can access:
CONN sys/password@db10g AS SYSDBA
ALTER USER scott IDENTIFIED BY tiger ACCOUNT UNLOCK;
GRANT CREATE ANY DIRECTORY TO scott;
CREATE OR REPLACE DIRECTORY test_dir AS ‘/u01/app/oracle/oradata/’;
After a directory is created, we need to grant READ and WRITE permission on the directory to other users. For example, to allow the Oracle database to read and to write to files on behalf of user scott in the directory named by test_dir, you must execute the following command:
GRANT READ, WRITE ON DIRECTORY test_dir TO scott;
Note that READ or WRITE permission to a directory object means only that the Oracle database will read or write that file on your behalf. We are not given direct access to those files outside of the Oracle database unless we have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.
Once the directory is granted, we can export a user’s object with command arguments that are very similar to exp and imp:
$expdp scott/tiger directory=test_dir dumpfile=scott.dmp
While the export job is running, we can press [Ctrl]C (or the equivalent on your client) to “detach” from the export job. The messages will stop coming to the client, but it’s still running inside the database. The client will be placed in an interactive mode (with Export> prompt). To see which jobs are running, type status. If we run expdp attach=<jobname>, we can attach to a running job.
Data Pump is much faster than the old exp and imp client commands. One new feature that really helps make it faster is the “parallel” option. With this option, the Data Pump will pump data in four different threads.
Data pump performance can be significantly improved by using the
PARALLEL parameter. This should be used in conjunction with the “%U” wildcard in the
DUMPFILE parameter to allow multiple dumpfiles to be created or read.
$expdp scott/tiger directory=dumpdir dumpfile=scott2.dmp parallel=4 job_name=scott2
$Impdp dpuser/dpuser@TDB10G schemas=dbuser directory=dumpdir dumpfile=expdp_TDB10G_dbuser.200470215_07.%U.dmp parallel=4
Other Useful Features
TABLE_EXISTS_ACTION – The original imp would allow rows to be appended to existing tables if IGNORE=Y was specified. The TABLE_EXISTS_ACTION parameter for Data Pump impdp provides four options:
1. SKIP is the default: A table is skipped if it already exists.
2. APPEND will append rows if the target table’s geometry is compatible.
3. TRUNCATE will truncate the table, then load rows from the source if the geometries are compatible and truncation is possible. For example, it is not possible to truncate a table if it is the target of referential constraints.
4. REPLACE will drop the existing table, then create and load it from the source.
CONTENT – This new parameter, applicable to both clients, allows the movement of DATA_ONLY, METADATA_ONLY, or BOTH (the default).
VERSION – expdp supports the VERSION parameter, which tells the server-based Data Pump to generate a dumpfile set compatible with the specified version. This will be used to perform downgrades in the future.
DATA PUMP VIEWS – Data Pump maintains a number of user- and DBA-accessible views to monitor the progress of jobs:
DBA_DATAPUMP_JOBS: This shows a summary of all active Data Pump jobs on the system.
USER_DATAPUMP_JOBS: This shows a summary of the current user’s active Data Pump jobs.
DBA_DATAPUMP_SESSIONS: This shows all sessions currently attached to Data Pump jobs.
V$SESSION_LONGOPS: A row is maintained in the view showing progress on each active Data Pump job. The OPNAME column displays the Data Pump job name.
Comparison Of Command-Line Parameters From Original Export and Import To Data Pump
Below are a few examples that demonstrate some of these differences.
Example, import of tables from scott’s account to jim’s account
> imp username/password FILE=scott.dmp FROMUSER=scott TOUSER=jim TABLES=(*) COMMIT=y COMPILE=y CONSTRAINTS=y IGNORE=y STATISTICS=none LOG=importscott.log
Data Pump Import:
> impdp username/password DIRECTORY=dpump_dir1 DUMPFILE=scott.dmp TABLES=scott.emp REMAP_SCHEMA=scott:jim
In Data pump utility the FROMUSER/TOUSER syntax is replaced by the REMAP_SCHEMA Option.
Example, export of an entire database to a dump files with all GRANTS, INDEXES, and data
> exp username/password FULL=y FILE=dba.dmp GRANTS=y INDEXES=y ROWS=y LOG=exportdba.log
> expdp username/password FULL=y INCLUDE=GRANT INCLUDE= INDEX DIRECTORY=dpump_dir1 DUMPFILE=dba.dmp CONTENT=ALL
Data Pump offers much greater metadata filtering than Original Export and Import. The INCLUDE parameter allows us to specify which object (and its dependent objects) you want to keep in the export job. The EXCLUDE parameter allows you to specify which object (and its dependent objects) you want to keep out of the export job. You cannot mix the two parameters in one job. Both parameters work with Data Pump Import as well, and you can use different INCLUDE and EXCLUDE options for different operations on the same dump file. We can get a quick summary of all parameters and commands by simply issuing
Following are few EXPDP and IMPDP commands with additional attributes.
$ expdp dpuser/dpuser@TDB10G schemas=dpuser include= TABLE:\”IN (\’EMP\’, \’DEPT\’)\” directory=dpump_dir1 dumpfile=dpuser.dmp log=dpuser.log
$expdp dpuser/dpuser@TDB10G schemas=dpuser exclude=TABLE:\”= \’EMP_DETAILS\’\” directory=dpump_dir1 dumpfile=dpuser2.dmp logfile=dpuser.log
$Impdp dpuser/dpuser@TDB10G schemas=dbuser directory=dumpdir dumpfile=expdp_TDB10G_dbuser.200470215_07.%U.dmp parallel=4
$impdp dpuser/dpuser@TDB10G schemas=dbuser dumpfile=DATA_PUMP_DIR:expdp_TDB10G_dbuser.200507_165936.%U.dmp logfile= DATA_PUMP_DIR:expdp_TDB10G_dbuser.200507_165936.out
$expdp dpuser/dpuser@TDB10G schemas=dpuser directory=dpump_dir1 parallel=4 dumpfile=dpuser_%U.dmp logfile=dpuser.log
CREATING ORACLE DATABASE USING IMPDP and EXPDP
Steps for creating the Oracle database:
Before creating the database we have to take Backup/ Export the entire existing Schema.
Syntax: > expdp username/password FULL=y INCLUDE=GRANT INCLUDE= INDEX DIRECTORY=dpump_dir1 DUMPFILE=dba.dmp CONTENT=ALL
Delete Database – Drops all objects and public synonyms for objects owned by the schema
Create the Database again using the Oracle Configuration Assistant and clone of deleted database.
Connect to the Oracle Database using system user and sysdba user and runs scripts for tablespace creation, creation of users – roles, granting of privileges.
create tablespace Tablespace_name
datafile ‘/&oradata_dir/oradata/&orasid/tablespace_name_datafile01.dbf’ size 24M
autoextend on next 24M maxsize 15000M
extent management local
segment space management auto
Create the dump directory using CREATE OR REPLACE DIRECTORY (dump_dir1) and grants read-write priviledges to dump directory object for the user.
The following SQL statements creates a user, a directory object named dpump_dir1 and grants the permissions to the user.
SQLPLUS system/manager@TDB10G as sysdba
SQL> create user dpuser identified by dpuser;
SQL> grant connect, resource to dpuser;
SQL> CREATE DIRECTORY dpump_dir1 AS ‘/opt/app/oracle’;
SQL> grant read, write on directory dpump_dir1 to dpuser
Connect to the newly created schema to test the connection.
SQL> dpuser/ dpuser@TDB10G;
Run the IMPDP command from the command prompt
$impdp dpuser/dpuser@TDB10G schemas=dbuser dumpfile=DATA_PUMP_DIR:expdp_TDB10G_dbuser.200507_165936.%U.dmp
Verify the log file for errors or discrepancies.
We need to copy the .DMP files in default database directory (DATA_PUMP_DIR) path.
We can also place these .DMP files in database directory path where we have created.
Table Level Exports/Imports
TABLES parameter is used to specify the tables that are to be exported. The following is an example of the table export and import syntax:
expdp scott/tiger@db10g tables=EMP,DEPT directory=TEST_DIR dumpfile=EMP_DEPT.dmp logfile=expdpEMP_DEPT.log
impdp scott/tiger@db10g tables=EMP,DEPT directory=TEST_DIR dumpfile=EMP_DEPT.dmp logfile=impdpEMP_DEPT.log
TABLE_EXISTS_ACTION=APPEND parameter allows data to be imported into existing tables.
Schema Level Exports/Imports
OWNER parameter of exp has been replaced by the
SCHEMAS parameter which is used to specify the schemas to be exported. The following is an example of the schema export and import syntax:
expdp scott/tiger@db10g schemas=SCOTT directory=TEST_DIR dumpfile=SCOTT.dmp logfile=expdpSCOTT.log
impdp scott/tiger@db10g schemas=SCOTT directory=TEST_DIR dumpfile=SCOTT.dmp logfile=impdpSCOTT.log
Database Level Exports/Imports
FULL parameter indicates that a complete database export is required. The following is an example of the full database export and import syntax:
expdp system/password@db10g full=Y directory=TEST_DIR dumpfile=DB10G.dmp logfile=expdpDB10G.log
pdp system/password@db10g full=Y directory=TEST_DIR dumpfile=DB10G.dmp logfile=impdpDB10G.log
Oracle Data Pump is a great tool for the fast movement of data between the databases and much of this performance improvement is derived from the use of parameter “parallelism.” New export and import clients, expdp and impdp, that fully exploit the Data Pump Infrastructure. They are implemented as complete supersets of the original exp and imp, and will eventually replace them.