3.MySQL标量子查询没有subquery cache优化,走 DEPENDENT SUBQUERY ,这个很差劲
点击(此处)折叠或打开
-
explain
-
select t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
-
from subq_t1 t1;
-
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
-
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
-
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
-
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
-
2 rows in set, 2 warnings (0.00 sec)
-
-
show warnings\G
-
*************************** 1. row ***************************
-
Level: Note
-
Code: 1276
-
Message: Field or reference 'employees.t1.a' of SELECT #2 was resolved in SELECT #1
-
*************************** 2. row ***************************
-
Level: Note
-
Code: 1003
- Message: /* select#1 */ select `employees`.`t1`.`a` AS `a`,(/* select#2 */ select `employees`.`t2`.`b` from `employees`.`subq_t2` `t2` where (`employees`.`t1`.`a` = `employees`.`t2`.`a`)) AS `b` from `employees`.`subq_t1` `t1`
使用hints也走DEPENDENT SUBQUERY:
点击(此处)折叠或打开
-
explain
-
select/*+subquery(@subq materialization)*/ t1.a,(select/*+qb_name(subq)*/ t2.b from subq_t2 t2 where t1.a = t2.a) b
-
from subq_t1 t1;
-
-
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
-
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
-
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
- +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
树形计划,子查询走索引被驱动30w次,和预期的3,5次,相差甚远,没有缓存优化。
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.034..232.504 rows=300000 loops=1)
-
-> Select #2 (subquery in projection; dependent)
- -> Index lookup on t2 using idx_subq_t2 (a=t1.a) (cost=0.35 rows=1) (actual time=0.003..0.004 rows=1 loops=300000)
总结:MySQL标量子查询只能走DEPENDENT SUBQUERY,没有缓存优化,所以在MySQL里,要少用标量子查询。
4.MySQL对于不能unnest的,如果外层表有过滤条件,如果条件不能走索引,则条件写在子查询前还是后对驱动子查询次数有影响,能走索引则没有影响。
多很多。
如果能走索引,那么一般按照先走索引过滤外部表,顺序没有影响。
1)先过滤,虽然这里还是对t1全表扫描,返回30w行,但是FILTER条件t1.b>300000先执行,
则返回0行,然后子查询不用执行。
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq intoexists)*/ *
-
from subq_t1 t1
-
where t1.b>300000 and t1.a in
- (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
注意Filter: ((t1.b > 300000) and
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,<exists>(select #2))) (cost=30156.85 rows=99952) (actual time=218.451..218.451 rows=0 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..193.624 rows=300000 loops=1)
-
-> Select #2 (subquery in condition; dependent)
-
-> Limit: 1 row(s) (never executed)
- -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)
2)先驱动子查询后过滤,t1返回30w行,子查询执行30w次,很拉跨
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq intoexists)*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
- and t1.b>300000\G
条件:Filter: (
这里的Table scan on t1执行时间是250ms,比前面的先过滤的193ms大。这里是真的返回30w行。
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: (<in_optimizer>(t1.a,<exists>(select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=1046.440..1046.440 rows=0 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..250.191 rows=300000 loops=1)
-
-> Select #2 (subquery in condition; dependent)
-
-> Limit: 1 row(s) (actual time=0.002..0.002 rows=1 loops=300000)
- -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (actual time=0.002..0.002 rows=1 loops=300000)
3)同样的不能unnest的materialization和intoexists一样,只不过materialization有缓存优化
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq materialization)*/ *
-
from subq_t1 t1
-
where t1.b>300000 and t1.a in
- (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
t1.b>300000写在子查询前,子查询执行0次:
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,t1.a in (select #2))) (cost=30156.85 rows=99952) (actual time=204.768..204.768 rows=0 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.026..182.554 rows=300000 loops=1)
-
-> Select #2 (subquery in condition; run only once)
-
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (never executed)
-
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (never executed)
-
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (never executed)
-
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
- -> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)
t1.b>300000写在子查询后,因为有缓存,子查询执行4次驱动物化表:
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq materialization)*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
- and t1.b>300000\G
子查询执行4次:
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: (<in_optimizer>(t1.a,t1.a in (select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=371.866..371.866 rows=0 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..199.214 rows=300000 loops=1)
-
-> Select #2 (subquery in condition; run only once)
-
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.640..27.640 rows=1 loops=4)
-
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.639..27.639 rows=1 loops=4)
-
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.638..27.638 rows=1 loops=4)
-
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=110.525..110.525 rows=100000 loops=1)
- -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..61.938 rows=100000 loops=1)
4)能够unnest的,过滤条件写在前后无影响
点击(此处)折叠或打开
-
explain analyze
-
select *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
-
and t1.b>300000\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Nested loop inner join (cost=1000689631.09 rows=10006494790) (actual time=220.631..220.631 rows=0 loops=1)
-
-> Filter: ((t1.b > 300000) and (t1.a is not null)) (cost=30156.85 rows=99952) (actual time=220.630..220.630 rows=0 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..198.257 rows=300000 loops=1)
-
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (never executed)
-
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
-
-> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (never executed)
- -> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)
5)如果外部表条件有索引,则先走索引过滤,没有影响
点击(此处)折叠或打开
- create index idx_sub1_t1 on subq_t1(b);
现在完全没有区别,外部表都是先Index range scan on t1 using idx_sub1_t1,返回0行,子查询不需要执行。
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq intoexists)*/ *
-
from subq_t1 t1
-
where t1.b>300000 and t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.057..0.057 rows=0 loops=1)
-
-> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.029..0.029 rows=0 loops=1)
-
-> Select #2 (subquery in condition; dependent)
-
-> Limit: 1 row(s) (never executed)
-
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)
-
- 1 row in set (0.00 sec)
t1.b>300000在子查询后,走索引,也是先过滤,子查询也是执行0次:
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq intoexists)*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
-
and t1.b>300000\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.022..0.022 rows=0 loops=1)
-
-> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.021..0.021 rows=0 loops=1)
-
-> Select #2 (subquery in condition; dependent)
-
-> Limit: 1 row(s) (never executed)
-
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)
-
- 1 row in set (0.01 sec)
5.能够unnest的是利用join/semi join/anti join,能够有5种方式优化子查询,没有缓存优化
点击(此处)折叠或打开
-
explain analyze
-
select *
-
from subq_t1 t1
-
where t1.a in
- (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
执行计划可以看到,驱动物化表30w次。因为外层重复结果多,这时候效率还不如unnest subquery+materialization
点击(此处)折叠或打开
-
*************************** 1. row ***************************
-
EXPLAIN: -> Nested loop inner join (cost=3002308857.25 rows=30022487118) (actual time=133.304..555.223 rows=300000 loops=1)
-
-> Filter: (t1.a is not null) (cost=30156.85 rows=299886) (actual time=0.028..282.177 rows=300000 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..258.794 rows=300000 loops=1)
-
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (actual time=0.001..0.001 rows=1 loops=300000)
-
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=133.266..133.266 rows=100000 loops=1)
-
-> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (actual time=0.007..77.712 rows=100000 loops=1)
- -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..69.800 rows=100000 loops=1)
no unnest subquery+materialization,因为子查询只需要执行4次,执行时间346ms,比前面的555ms高。
点击(此处)折叠或打开
-
explain analyze
-
select/*+subquery(@subq materialization)*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Filter: <in_optimizer>(t1.a,t1.a in (select #2)) (cost=30156.85 rows=299886) (actual time=111.111..346.171 rows=300000 loops=1)
-
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..189.058 rows=300000 loops=1)
-
-> Select #2 (subquery in condition; run only once)
-
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.775..27.775 rows=1 loops=4)
-
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.773..27.773 rows=1 loops=4)
-
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.773..27.773 rows=1 loops=4)
-
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=111.064..111.064 rows=100000 loops=1)
- -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..62.216 rows=100000 loops=1)
在ORACLE里semi join,anti join有缓存优化。