MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分

接：MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分_PART1

3.MySQL标量子查询没有subquery cache优化，走 DEPENDENT SUBQUERY ，这个很差劲

从下列执行计划看，标量子查询只能走DEPENDENT SUBQUERY ，没有缓存优化。

点击(此处)折叠或打开

explain
select t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
from subq_t1 t1;
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
2 rows in set, 2 warnings (0.00 sec)
show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference 'employees.t1.a' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `employees`.`t1`.`a` AS `a`,(/* select#2 */ select `employees`.`t2`.`b` from `employees`.`subq_t2` `t2` where (`employees`.`t1`.`a` = `employees`.`t2`.`a`)) AS `b` from `employees`.`subq_t1` `t1`

使用hints也走DEPENDENT SUBQUERY：

点击(此处)折叠或打开

explain
select/*+subquery(@subq materialization)*/ t1.a,(select/*+qb_name(subq)*/ t2.b from subq_t2 t2 where t1.a = t2.a) b
from subq_t1 t1;
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
+----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+

树形计划，子查询走索引被驱动30w次，和预期的3,5次，相差甚远，没有缓存优化。

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.034..232.504 rows=300000 loops=1)
-> Select #2 (subquery in projection; dependent)
-> Index lookup on t2 using idx_subq_t2 (a=t1.a) (cost=0.35 rows=1) (actual time=0.003..0.004 rows=1 loops=300000)

总结：MySQL标量子查询只能走DEPENDENT SUBQUERY，没有缓存优化，所以在MySQL里，要少用标量子查询。

4.MySQL对于不能unnest的，如果外层表有过滤条件，如果条件不能走索引，则条件写在子查询前还是后对驱动子查询次数有影响，能走索引则没有影响。

写在前先过滤后然后驱动子查询，写在后则是先驱动子查询后过滤外层条件，可能执行次数会
多很多。

如果能走索引，那么一般按照先走索引过滤外部表，顺序没有影响。

1）先过滤，虽然这里还是对t1全表扫描，返回30w行，但是FILTER条件t1.b>300000先执行，
则返回0行，然后子查询不用执行。

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq intoexists)*/ *
from subq_t1 t1
where t1.b>300000 and t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

注意Filter: ((t1.b > 300000) and (t1.a,(select #2)))条件，在前的先执行

实际上这个是树形执行计划显示的问题，应该Table scan on t1直接按照(t1.b > 300000)扫描，返回0行，通过执行时间可以看出是193ms,比全表扫描返回所有行小很多

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,<exists>(select #2))) (cost=30156.85 rows=99952) (actual time=218.451..218.451 rows=0 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..193.624 rows=300000 loops=1)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (never executed)
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)

2）先驱动子查询后过滤，t1返回30w行，子查询执行30w次，很拉跨

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq intoexists)*/ *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
and t1.b>300000\G

条件：Filter: ((t1.a,(select #2)) and (t1.b > 300000))
这里的Table scan on t1执行时间是250ms,比前面的先过滤的193ms大。这里是真的返回30w行。

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Filter: (<in_optimizer>(t1.a,<exists>(select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=1046.440..1046.440 rows=0 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..250.191 rows=300000 loops=1)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (actual time=0.002..0.002 rows=1 loops=300000)
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (actual time=0.002..0.002 rows=1 loops=300000)

3）同样的不能unnest的materialization和intoexists一样，只不过materialization有缓存优化

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq materialization)*/ *
from subq_t1 t1
where t1.b>300000 and t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

t1.b>300000写在子查询前，子查询执行0次：

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,t1.a in (select #2))) (cost=30156.85 rows=99952) (actual time=204.768..204.768 rows=0 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.026..182.554 rows=300000 loops=1)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (never executed)
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (never executed)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (never executed)
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
-> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)

t1.b>300000写在子查询后，因为有缓存，子查询执行4次驱动物化表：

SQL：

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq materialization)*/ *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
and t1.b>300000\G

子查询执行4次：

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Filter: (<in_optimizer>(t1.a,t1.a in (select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=371.866..371.866 rows=0 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..199.214 rows=300000 loops=1)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.640..27.640 rows=1 loops=4)
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.639..27.639 rows=1 loops=4)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.638..27.638 rows=1 loops=4)
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=110.525..110.525 rows=100000 loops=1)
-> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..61.938 rows=100000 loops=1)

4）能够unnest的，过滤条件写在前后无影响

t1.b>300000写在子查询后，也是先过滤，子查询执行0次：

点击(此处)折叠或打开

explain analyze
select *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
and t1.b>300000\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=1000689631.09 rows=10006494790) (actual time=220.631..220.631 rows=0 loops=1)
-> Filter: ((t1.b > 300000) and (t1.a is not null)) (cost=30156.85 rows=99952) (actual time=220.630..220.630 rows=0 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..198.257 rows=300000 loops=1)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (never executed)
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
-> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (never executed)
-> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)

5)如果外部表条件有索引，则先走索引过滤，没有影响

建立索引：

点击(此处)折叠或打开

create index idx_sub1_t1 on subq_t1(b);

现在完全没有区别，外部表都是先Index range scan on t1 using idx_sub1_t1，返回0行，子查询不需要执行。

t1.b>300000在子查询前，走索引，也是先过滤，子查询执行0次：

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq intoexists)*/ *
from subq_t1 t1
where t1.b>300000 and t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.057..0.057 rows=0 loops=1)
-> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.029..0.029 rows=0 loops=1)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (never executed)
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)
1 row in set (0.00 sec)

t1.b>300000在子查询后，走索引，也是先过滤，子查询也是执行0次：

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq intoexists)*/ *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
and t1.b>300000\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.022..0.022 rows=0 loops=1)
-> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.021..0.021 rows=0 loops=1)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (never executed)
-> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)
1 row in set (0.01 sec)

5.能够unnest的是利用join/semi join/anti join，能够有5种方式优化子查询，没有缓存优化

5种子查询策略：pullout,firstmatch,loosescan,materialization,dupsweedout

点击(此处)折叠或打开

explain analyze
select *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

执行计划可以看到，驱动物化表30w次。因为外层重复结果多，这时候效率还不如unnest subquery+materialization

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=3002308857.25 rows=30022487118) (actual time=133.304..555.223 rows=300000 loops=1)
-> Filter: (t1.a is not null) (cost=30156.85 rows=299886) (actual time=0.028..282.177 rows=300000 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..258.794 rows=300000 loops=1)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (actual time=0.001..0.001 rows=1 loops=300000)
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=133.266..133.266 rows=100000 loops=1)
-> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (actual time=0.007..77.712 rows=100000 loops=1)
-> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..69.800 rows=100000 loops=1)

no unnest subquery+materialization，因为子查询只需要执行4次，执行时间346ms,比前面的555ms高。

点击(此处)折叠或打开

explain analyze
select/*+subquery(@subq materialization)*/ *
from subq_t1 t1
where t1.a in
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: <in_optimizer>(t1.a,t1.a in (select #2)) (cost=30156.85 rows=299886) (actual time=111.111..346.171 rows=300000 loops=1)
-> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..189.058 rows=300000 loops=1)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.775..27.775 rows=1 loops=4)
-> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.773..27.773 rows=1 loops=4)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.773..27.773 rows=1 loops=4)
-> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=111.064..111.064 rows=100000 loops=1)
-> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..62.216 rows=100000 loops=1)

在ORACLE里semi join,anti join有缓存优化。

MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分_PART2

3.MySQL标量子查询没有subquery cache优化，走 DEPENDENT SUBQUERY ，这个很差劲

4.MySQL对于不能unnest的，如果外层表有过滤条件，如果条件不能走索引，则条件写在子查询前还是后对驱动子查询次数有影响，能走索引则没有影响。

5.能够unnest的是利用join/semi join/anti join，能够有5种方式优化子查询，没有缓存优化