中国OCM之家 加入小组

13个成员 3个话题 创建时间:2021-01-15

利用数据泵导出多表关联数据

发表于2021-04-09 2284次查看

根据业务需求,需要导出 test1.trade_20XXXX系列表的数据,由于该数据是按月建的,所有省份的数据都在这个表里,个别省份申请数据需要做一下联合查询才能筛选出该省份的数据.

select t.* from test1.trade_201101 t, test1.org g where t.orgid=g.orgid and g.IDENT BETWEEN 15636 and 19636;

需要导出的数据量
SYS@ora1>> select count(*) from test1.trade_201101 t, test1.org g where t.orgid=g.orgid and g.IDENT BETWEEN 15636 and 19636;
  COUNT(*)
----------
   1114518

改写SQL

select count(*) from test1.trade_201101  where exists(select 1 from test1.org g where test1.trade_201101.orgid=g.orgid and g.IDENT BETWEEN 15636 and 19636);

对应的par文件

-bash-3.00$ vi trade_201101.par
"trade_201101.par" [New file]
DIRECTORY=ora1
EXCLUDE=STATISTICS
logfile=trade_201101_20210409.log
dumpfile=trade_201101_20210409.dmp
tables=test1.trade_201101
query="test1.trade_201101:where exists(select 1 from test1.org g where test1.trade_201101.orgid=g.orgid and g.IDENT BETWEEN 15636 and 19636)"

报错信息
ORA-31693: Table data object "test1"."TRADE_201101" failed to load/unload and is being skipped due to error:
ORA-00904: "test1"."TRADE_201101"."ORGID": invalid identifier
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded

注意在trade_201101表对应的这个orgid列的位置不能带alias或是表名,否则报错

-bash-3.00$ vi trade_201101.par
"trade_201101.par" 7 lines, 287 characters
DIRECTORY=ora1
EXCLUDE=STATISTICS
logfile=trade_201101_20210409.log
dumpfile=trade_201101_20210409.dmp
tables=test1.trade_201101
query="test1.trade_201101:where exists(select 1 from test1.org g where orgid=g.orgid and g.IDENT BETWEEN 15636 and 19636)"

能正常导出,但是导出的条数和联合查询的条数不符,和这种改写查询是一样的,类似是表的全部数据.
. . exported "test1"."TRADE_201101"                2.294 GB 5575587 rows
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded

官方文档解释 Utilities >>> Data Pump Export >>> QUERY
* When the QUERY parameter is specified for a table, Data Pump uses external tables to unload the target table. External tables uses a SQL CREATE TABLE AS SELECT statement. The value of the QUERY parameter is the WHERE clause in the SELECT portion of the CREATE TABLE statement. If the QUERY parameter includes references to another table with columns whose names match the table being unloaded, and if those columns are used in the query, then you will need to use a table alias to distinguish between columns in the table being unloaded and columns in the SELECT statement with the same name. The table alias used by Data Pump for the table being unloaded is KU$.
For example, suppose you want to export a subset of the sh.sales table based on the credit limit for a customer in the sh.customers table. In the following example, KU$ is used to qualify the cust_id field in the QUERY parameter for unloading sh.sales. As a result, Data Pump exports only rows for customers whose credit limit is greater than $10,000.
QUERY='sales:"WHERE EXISTS (SELECT cust_id FROM customers c
   WHERE cust_credit_limit > 10000 AND ku$.cust_id = c.cust_id)"'
If, as in the following query, KU$ is not used for a table alias, then the result will be that all rows are unloaded:
QUERY='sales:"WHERE EXISTS (SELECT cust_id FROM customers c
   WHERE cust_credit_limit > 10000 AND cust_id = c.cust_id)"'
* The maximum length allowed for a QUERY string is 4000 bytes including quotation marks, which means that the actual maximum length allowed is 3998 bytes.

 

再次用in改写SQL

SYS@ora1>> select count(*) from test1.trade_201101 where orgid in (select orgid from test1.org g where  g.IDENT BETWEEN 15636 and 19636);
  COUNT(*)
----------
   1114518
调整后是par文件
-bash-3.00$ vi trade_201101.par
"trade_201101.par" 7 lines, 274 characters
DIRECTORY=ora1
EXCLUDE=STATISTICS
logfile=trade_201101_20210409.log
dumpfile=trade_201101_20210409.dmp
tables=test1.trade_201101
query="test1.trade_201101:where orgid in (select orgid from test1.org g where  g.IDENT BETWEEN 15636 and 19636)"
~
导出数据量,符合条件.
. . exported "test1"."TRADE_201101"                479.9 MB 1114518 rows
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded

 

1回复
发表回复
你还没有登录,请先 登录或 注册!