子查询的原始语义是:
只返回主表结果,子查询只是用来进行存在性判断。
从这点看,如果子查询要转为普通join,要么join key有唯一键,要么子查询结果按照join key剔重,当然anti join不需要,
一般anti join不会自动改成left join+inner.id is null,但是可以手动实现,这种只查找不匹配的,不需要子查询join key唯一。
像ORACLE的子查询做成view然后hash unqiue就是剔重转为普通join,像MySQL的materialization,loose scan,pull out,dupsweedout都有各种
按照join key对子查询剔重的动作。
子查询原始执行顺序:(只是原始执行顺序,实际上有查询转换)
1)非相关子查询,子查询可以单独执行,然后做成view来驱动主表(做普通join,有剔重)
2)相关子查询,主表的每1行驱动执行子查询,类似循环。当然,这种一般在CBO里是可以转为semi join,join等,
如果不能unnest就是原始语义执行,也有subquery cache等优化。
从原始语义上看,非相关子查询有一定优势,如果子查询不能unnest, 则非相关子查询可以单独执行,这对join reorder就有很大操作空间,
可以子查询单独执行后做成view,结果按照join key剔重,然后将子查询转为普通join,对于子查询不能unnest的有一定优势。
另外子查询如果走nested loops semi/anti,则必须是主表驱动子查询,否则因为子查询join key可能重复,那么结果就不对了。
像hash join,oracle和postgres有专门的hash join right semi/anti,那么就可以让子查询表来做驱动表,MySQL没有这种功能,
MySQL如果要子查询驱动主表,则是内部剔重后改成普通join,比如materialization,loose scan,pull out,dupsweedout,像
firtmatch是主表驱动子查询的,anti join也是。
当然,对于semi join只要将子查询的join key剔重,转为普通join是可以的,{BANNED}最佳好是非相关子查询,相关子查询触发限制的时候则不能转。
对于anti join,则不需要子查询join key必须是唯一的,因为查找不存在的,可以直接用
out_table left join subquery_tab on ..where subquery_tab.id is null来改写。
1)对于非相关子查询,oracle,pg有多种方式优化,就算含有聚合函数等不能unnest,也可以单独执行,这样就可以做成view,然后与主表join来驱动主表,不能unnest的写成非相关子查询有一定优势。
如果返回单行或= subquery,则传值给父查询,
对于pg会有initplan,如果多行,可以单独执行结果集变成view剔重,与主表做普通join优化。
像mysql的非相关子查询,不能unnest的,如果是in,则还是走FILTER主查询循环驱动子查询
(走materiation则有subquery cache,可以减少子查询执行次数)。
如果是=,>,>=,<,<=也可以单独执行,执行计划有subquery in condition; run only once。
##非相关子查询在mysql,pg里挺重要,如果可能,尽量写成非相关子查询。特别是子查询返回1行的,写成=,不要写成in
2)对于相关子查询,不能unnest的,基本都差不多,必须主表循环驱动子查询,只不过ORACLE走FILTER,有subquery cache,
MySQL的相关子查询不能unnest,走intoexists,不能减少子查询执行次数,但是对子查询索引访问的有cache,也可以一定程度提高效率。
相关子查询如果不能unnest,效率低,subquery cache起不到减少子查询执行次数的目的,则{BANNED}最佳好改为非相关子查询,join,left join等
###
当相关子查询有聚合函数,having, set操作,rownum/limit,connect by,分析函数等复杂语法,则相关子查询不能unnest.
###
对于相关子查询包含rownum/limit,pg和mysql可以消除,因为这个本身就没有意义,本身子查询就是firtmatch,而oracle不能消除
子查询只要有聚合函数,having之类的,都不能unnest,就算聚合函数,having其实可以忽略,但是查询转换检查条件没有做这种改写。
单纯的group by,没有聚合函数,相当于distinct,这种本来子查询就是distinct结果,所以可以忽略,不影响unnest。
=========ORACLE子查询
对于ORACLE,非相关子查询,可以单独执行,如果SQL结构确定{BANNED}最佳多只返回一行,则可以作为变量传给父查询,这种不是unnest,
是非相关子查询的原始语义。
在O里,子查询有聚合函数,则相关子查询可能不能unnest,像相关子查询select部分有聚合函数,直接根据语义去掉子查询,比如having有
聚合函数可能不能unnest。
但是如果子查询是非相关的,可以单独执行,如果是确定单行,则可以作为变量传给父查询,这种不是unnest,是非相关子查询的原始语义:
如果返回多行,则可以将子查询改写为view与原表做join。
ORACLE不管in或=,CBO会判断子查询是否返回单行,然后将in改写为=,所以是一样的,这个和MySQL,postgresql不同。
点击(此处)折叠或打开
-
select * from a where a.object_id in (select max(b.object_id) from b);
-
-
@display
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
SQL_ID 522hg1mmrz0a4, child number 1
-
-------------------------------------
-
select * from a where a.object_id in (select max(b.object_id) from b)
-
-
Plan hash value: 742149035
-
-
-------------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
-------------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 5 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| A | 1 | 4 | 0 |00:00:00.01 | 5 |
-
|* 2 | INDEX RANGE SCAN | IDX_A | 1 | 4 | 0 |00:00:00.01 | 5 |
-
| 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 2 |
-
| 4 | INDEX FULL SCAN (MIN/MAX) | IDX_B | 1 | 1 | 1 |00:00:00.01 | 2 |
-
-------------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
- 2 - access("A"."OBJECT_ID"=)
##对于这种,子查询返回不能确定是一行,则可以单独改写为view,然后与a表做join:
点击(此处)折叠或打开
-
select * from a where a.object_id in (select max(b.object_id) from b group by object_type);
-
-
Elapsed: 00:00:00.02
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 1790892509
-
-
-----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 177 | 31683 | 672 (2)| 00:00:01 |
-
| 1 | NESTED LOOPS | | 177 | 31683 | 672 (2)| 00:00:01 |
-
| 2 | NESTED LOOPS | | 180 | 31683 | 672 (2)| 00:00:01 |
-
| 3 | VIEW | VW_NSO_1 | 45 | 585 | 401 (2)| 00:00:01 |
-
| 4 | HASH UNIQUE | | 45 | 675 | 401 (2)| 00:00:01 |
-
| 5 | HASH GROUP BY | | 45 | 675 | 401 (2)| 00:00:01 |
-
| 6 | TABLE ACCESS FULL | B | 73044 | 1069K| 396 (1)| 00:00:01 |
-
|* 7 | INDEX RANGE SCAN | IDX_A | 4 | | 2 (0)| 00:00:01 |
-
| 8 | TABLE ACCESS BY INDEX ROWID| A | 4 | 664 | 6 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
- 7 - access("A"."OBJECT_ID"="MAX(B.OBJECT_ID)")
这种非相关子查询<,>的,估算不准,因为不知道子查询返回的具体值,按照默认选择性5%估算,如果是join,貌似oracle可以计算了,可能通过直方图计算的,算法未知
点击(此处)折叠或打开
-
select num_rows from dba_tab_statistics where table_name='A' and owner=user;
-
-
NUM_ROWS
-
----------
-
292172
-
-
估算为: 292172*0.05= 14609
-
-
select count(*) from a where a.object_id <= (select max(b.object_id) from b where object_name like 'A%');
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
SQL_ID cw3jcfgarmpuq, child number 1
-
-------------------------------------
-
select count(*) from a where a.object_id <= (select max(b.object_id)
-
from b where object_name like 'A%')
-
-
Plan hash value: 3936981519
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.05 | 4152 |
-
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.05 | 4152 |
-
|* 2 | INDEX RANGE SCAN | IDX_A | 1 | 14609 | 292K|00:00:00.04 | 4152 |
-
| 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1422 |
-
|* 4 | TABLE ACCESS FULL| B | 1 | 1448 | 2716 |00:00:00.01 | 1422 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("A"."OBJECT_ID"<=)
-
4 - filter("OBJECT_NAME" LIKE 'A%')
-
-
-
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
SQL_ID f6nhfvnrqb479, child number 0
-
-------------------------------------
-
select count(*) from a where a.object_id <= (select b.object_id from b
-
where object_name like 'A%' and rownum=1)
-
-
Plan hash value: 914045866
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 8 |
-
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 8 |
-
|* 2 | INDEX RANGE SCAN | IDX_A | 1 | 14609 | 332 |00:00:00.01 | 8 |
-
|* 3 | COUNT STOPKEY | | 1 | | 1 |00:00:00.01 | 4 |
-
|* 4 | TABLE ACCESS FULL| B | 1 | 1 | 1 |00:00:00.01 | 4 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("A"."OBJECT_ID"<=)
-
3 - filter(ROWNUM=1)
- 4 - filter("OBJECT_NAME" LIKE 'A%')
等值的按照均匀分布计算
点击(此处)折叠或打开
-
select count(*) from a where a.object_id = (select max(b.object_id) from b where object_name like 'A%');
-
-
COUNT(*)
-
----------
-
4
-
-
1 row selected.
-
-
Elapsed: 00:00:00.01
-
dingjun123@ORCLPDB> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
SQL_ID a2v0fwb12rqt9, child number 0
-
-------------------------------------
-
select count(*) from a where a.object_id = (select max(b.object_id)
-
from b where object_name like 'A%')
-
-
Plan hash value: 3936981519
-
-
-------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-
-------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1425 | 1 |
-
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1425 | 1 |
-
|* 2 | INDEX RANGE SCAN | IDX_A | 1 | 4 | 4 |00:00:00.01 | 1425 | 1 |
-
| 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1422 | 0 |
-
|* 4 | TABLE ACCESS FULL| B | 1 | 1448 | 2716 |00:00:00.01 | 1422 | 0 |
-
-------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("A"."OBJECT_ID"=)
-
4 - filter("OBJECT_NAME" LIKE 'A%')
-
-
-
23 rows selected.
-
-
计算方式:
-
select num_rows from dba_tab_statistics where table_name='A' and owner=user;
-
-
NUM_ROWS
-
----------
-
292172
-
-
select num_distinct,density from dba_tab_col_statistics where table_name='A' and column_name='OBJECT_ID' and owner=user;
-
-
NUM_DISTINCT DENSITY
-
------------ ----------
-
74144 .000013487
-
-
估算为4行
-
select 292172 * .000013487 from dual;
-
-
292172*.000013487
-
-----------------
- 3.94052376