对符合条件的disjunctive subquery,ORACLE CBO能够进行unnest,由隐含参数_optimizer_unnest_disjunctive_subq控制。
本次讨论的就是将OR子查询改写为JOIN,构造等值条件,让原来走FILTER的执行计划走HASH JOIN,大幅度提高效率。
示例建表语句如下:
|
DROP TABLE t1; create table t1 as select to_char(trunc(dbms_random.value(10000000000, 20000000000 ))) phone_no, trunc(dbms_random.value(0, 30 )) ext, lpad(level,10) v1, rpad('x',100) padding from dual connect by level <= 1000000; DROP TABLE t2; create table t2 as select to_char(trunc(dbms_random.value(10000000000, 20000000000 ))) phone_no, trunc(dbms_random.value(0, 30 )) ext, lpad(level,10) v1, rpad('x',100) padding from dual connect by level <= 1000000; --创建索引 create index idx_t2 on t2(phone_no); --收集统计信息 exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>user,tabname=>'t1',estimate_percent=>10,method_opt=>'for all columns size auto',no_invalidate=>false,cascade=>true,degree => 10); exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>user,tabname=>'t2',estimate_percent=>10,method_opt=>'for all columns size auto',no_invalidate=>false,cascade=>true,degree => 10); 数据量: select count(*) from t1; COUNT(*) ---------- 1000000 select count(*) from t2; COUNT(*) ---------- 1000000 统计信息: TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED ------------------------------ ---------- ----------- ------------------- T1 1003550 100355 2023-08-18 17:13:11 T2 996380 99638 2023-08-18 17:13:11 |
原SQL:
主表t1,然后t2是多个or子查询连用,这种情况下是走不了subquery unnest的,执行计划只能走FILTER:
|
SELECT phone_no,ext,v1,padding FROM t1 WHERE SUBSTR(t1.phone_no,1,8) IN (SELECT t2.phone_no FROM t2 WHERE LENGTH(t2.phone_no)=8) OR SUBSTR(t1.phone_no,1,9) IN (SELECT t2.phone_no FROM t2 WHERE LENGTH(t2.phone_no)=9) OR SUBSTR(t1.phone_no,1,10) IN (SELECT t2.phone_no FROM t2 WHERE LENGTH(t2.phone_no)=10) OR SUBSTR(t1.phone_no,1,11) IN (SELECT t2.phone_no FROM t2 WHERE LENGTH(t2.phone_no)=11); |
执行计划如下,返回95行,耗时10.46s,可以可以看到执行计划走FILTER,类似NESTED LOOPS,主表T1估算返回100w行+,所以子查询被驱动百万次,性能差:
执行计划如下所示:
|
95 rows selected.
|
分析与优化:
主要条件用OR子查询导致不能UNNEST,按照条件的结构特点,子查询的条件有一定规律,
比如:
WHERE SUBSTR(t1.phone_no,1,8) IN
(SELECT t2.phone_no FROM t2 WHERE LENGTH(t2.phone_no)=8)
t1.phone截取前8个字符与t2.phone一样。
利用这个规律可以构造hash join,注意构造HASH JOIN条件注意{BANNED}最佳好重复值较少,否则碰到hash碰撞,性能低。SQL改写如下:
|
SELECT distinct t1.phone_no,t1.ext,t1.v1,t1.padding FROM t1,(SELECT DISTINCT phone_no FROM t2 WHERE LENGTH(t2.phone_no) BETWEEN 8 AND 11) t22 WHERE t1.phone_no LIKE t22.phone_no||'%' --构造HASH JOIN关键,{BANNED}最佳好重复值小,不然有HASH碰撞 AND SUBSTR(t1.phone_no,1,8)=SUBSTR(t22.phone_no,1,8) --下面的是防止其他不满足9-11的条件过滤,这个类似原来OR条件 AND SUBSTR(t1.phone_no,1,length(t22.phone_no))=t22.phone_no; |
去掉子查询后,走HASH JOIN,执行时间从10.46s降低到0.88s,提升十倍以上:
|
95 rows selected. Elapsed: 00:00:00.88 Execution Plan ---------------------------------------------------------- Plan hash value: 555290710 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 139 | 6085 (3)| 00:01:14 | | 1 | HASH UNIQUE | | 1 | 139 | 6085 (3)| 00:01:14 | |* 2 | HASH JOIN | | 1 | 139 | 6084 (3)| 00:01:14 | |* 3 | INDEX FAST FULL SCAN| IDX_T2 | 2491 | 29892 | 882 (2)| 00:00:11 | | 4 | TABLE ACCESS FULL | T1 | 1003K| 121M| 5082 (1)| 00:01:01 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(SUBSTR("T1"."PHONE_NO",1,8)=SUBSTR("PHONE_NO",1,8)) filter("T1"."PHONE_NO" LIKE "PHONE_NO"||'%' AND "PHONE_NO"=SUBSTR("T1"."PHONE_NO",1,LENGTH("PHONE_NO"))) 3 - filter(LENGTH("T2"."PHONE_NO")>=8 AND LENGTH("T2"."PHONE_NO")<=11) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 21748 consistent gets 18519 physical reads 0 redo size 4993 bytes sent via SQL*Net to client 586 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 95 rows processed |