oracle,mysql组合索引前导列范围条件且包含等值条件,可能效率只有范围的更高

30阅读 0评论2025-04-26 dingjun123
分类:Oracle

将a>0 and b>3这种改成a>=1 and b>3提高效率必须满足特定条件:
1)a,b列组合索引
2)a是整数,不能a>0改成a>=0,那么就不等价了,就算结果一样,索引访问的数据量也没有变化
3)a>0的数据量多,结果很多都在a=1 and b>3上,而a>1的数据量少


组合索引要符合leftmost prefix规则,前导列是范围的,后续列条件不能index access,只能index filter...
但是貌似有例外:
MySQL的like后通配,between,>=,<=貌似会用到后续列,因为他们包含了一个等值条件,
比如index(a,b),对于where a>=1 and b>3,“如果a是整数”
然后索引访问实际上是拆成两份:a=1 and b>3 和a>1 and b>3,前者能真正用到两个列索引访问,后者只能用到a>1
那么,如果存在a>1的数据很少,a=1的数据多,但是a=1 and b>3的数据少,
那么where a>=1 and b>3就比a>0 and b>3的效率高,因为写成
a>0 and b>3则只能用到a>0,而a>0是等价于a>=1的。。。但是实际上访问效率不一样,


注意:a>0和a>=0是不等价的或效率没有啥区别,必须a是整数,a>0写成a>=1,且要满足上述数据分布才能提升效率。
可以通过下面在三大库上测试看出效率区别。

构造测试数据:

点击(此处)折叠或打开

  1. drop table t;
  2. create table t
  3. (
  4. a int,
  5. b int,
  6. ext int
  7. );

  8. -- a=1的数据多,a>1的数据少
  9. insert into t values(0,2,1);
  10. insert into t values(1,2,1);
  11. insert into t values(1,2,1);
  12. insert into t values(1,2,1);
  13. insert into t values(1,2,1);
  14. insert into t values(2,2,1);

  15. -- 插入300w行+
  16. insert into t select * from t;
  17. insert into t select * from t;
  18. insert into t select * from t;
  19. insert into t select * from t;
  20. insert into t select * from t;

  21. -- 再插入2条满足条件的行
  22. insert into t values(1,5,1);
  23. insert into t values(2,10,1);
  24. commit;

  25. -- 创建组合索引
  26. create index idx_t on t(a,b);

--- oracle测试
=== 如果查询的数据集中在某个前导列值上,而且返回行很少,
=== 前导列其他满足的条件行数少,比如a=1 and b>3直接定位行数少


== a>0 and b>3只能用到a>0索引访问,那么基本包含所有数据,默认不走索引,这里强制走索引,逻辑读12430


点击(此处)折叠或打开

  1. set autotrace traceonly
  2. select/*+index(t)*/ * from t where a>0 and b>3;
  3. Elapsed: 00:00:00.37

  4. Execution Plan
  5. ----------------------------------------------------------
  6. Plan hash value: 120220905

  7. ---------------------------------------------------------------------------------------------
  8. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  9. ---------------------------------------------------------------------------------------------
  10. | 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
  11. | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
  12. |* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
  13. ---------------------------------------------------------------------------------------------

  14. Predicate Information (identified by operation id):
  15. ---------------------------------------------------

  16.    2 - access("A">0 AND "B">3 AND "A" IS NOT NULL)
  17.        filter("B">3)

  18. Note
  19. -----
  20.    - dynamic statistics used: dynamic sampling (level=2)


  21. Statistics
  22. ----------------------------------------------------------
  23.           0 recursive calls
  24.           0 db block gets
  25.       12430 consistent gets
  26.           0 physical reads
  27.           0 redo size
  28.         761 bytes sent via SQL*Net to client
  29.         412 bytes received via SQL*Net from client
  30.           2 SQL*Net roundtrips to/from client
  31.           0 sorts (memory)
  32.           0 sorts (disk)
  33.           2 rows processed

-- 将 a>0改成a>=1,那么a=1 and b>3部分用到2个列索引访问,定位2行数据,效率高,而a>1的数据少,那么a>1 and b>3部分效率也可以
-- 逻辑读从12430到2491
-- 虽然O的执行计划还是b>3是index filter...但是access里隐含了a=1 and b>3直接访问

点击(此处)折叠或打开

  1. select/*+index(t)*/ * from t where a>=1 and b>3;

  2. 2 rows selected.

  3. Elapsed: 00:00:00.08

  4. Execution Plan
  5. ----------------------------------------------------------
  6. Plan hash value: 120220905

  7. ---------------------------------------------------------------------------------------------
  8. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  9. ---------------------------------------------------------------------------------------------
  10. | 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
  11. | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
  12. |* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
  13. ---------------------------------------------------------------------------------------------

  14. Predicate Information (identified by operation id):
  15. ---------------------------------------------------

  16.    2 - access("A">=1 AND "B">3 AND "A" IS NOT NULL)
  17.        filter("B">3)

  18. Note
  19. -----
  20.    - dynamic statistics used: dynamic sampling (level=2)


  21. Statistics
  22. ----------------------------------------------------------
  23.           0 recursive calls
  24.           0 db block gets
  25.        2491 consistent gets
  26.           0 physical reads
  27.           0 redo size
  28.         761 bytes sent via SQL*Net to client
  29.         409 bytes received via SQL*Net from client
  30.           2 SQL*Net roundtrips to/from client
  31.           0 sorts (memory)
  32.           0 sorts (disk)
  33.           2 rows processed

--类似将a>=1拆成a=1和a>1两条语句执行: 

点击(此处)折叠或打开

  1. select/*+index(t)*/ * from t where a=1 and b>3
  2. union all
  3. select/*+index(t)*/ * from t where a>1 and b>3;

  4. 2 rows selected.

  5. Elapsed: 00:00:00.10

  6. Execution Plan
  7. ----------------------------------------------------------
  8. Plan hash value: 230998067

  9. ----------------------------------------------------------------------------------------------
  10. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  11. ----------------------------------------------------------------------------------------------
  12. | 0 | SELECT STATEMENT | | 135 | 5265 | 2572 (1)| 00:00:01 |
  13. | 1 | UNION-ALL | | | | | |
  14. | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 39 | 4 (0)| 00:00:01 |
  15. |* 3 | INDEX RANGE SCAN | IDX_T | 1 | | 3 (0)| 00:00:01 |
  16. | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 2568 (1)| 00:00:01 |
  17. |* 5 | INDEX RANGE SCAN | IDX_T | 11534 | | 2514 (1)| 00:00:01 |
  18. ----------------------------------------------------------------------------------------------

  19. Predicate Information (identified by operation id):
  20. ---------------------------------------------------

  21.    3 - access("A"=1 AND "B">3 AND "B" IS NOT NULL)
  22.    5 - access("A">1 AND "B">3 AND "A" IS NOT NULL)
  23.        filter("B">3)

  24. Note
  25. -----
  26.    - dynamic statistics used: dynamic sampling (level=2)


  27. Statistics
  28. ----------------------------------------------------------
  29.           0 recursive calls
  30.           0 db block gets
  31.        2494 consistent gets
  32.           0 physical reads
  33.           0 redo size
  34.         753 bytes sent via SQL*Net to client
  35.         465 bytes received via SQL*Net from client
  36.           2 SQL*Net roundtrips to/from client
  37.           0 sorts (memory)
  38.           0 sorts (disk)
  39.           2 rows processed

-- mysql     
-- mysql类似oracle的访问模式,而且前导列a>=1 and b>3,普通计划的key_len会变化,树形计划的access部分会用到2个条件
-- 实际上只能a=1 and b>3部分才能用到2个列索引访问,而a>1 and b>3部分只能用到a>1部分索引访问,而b>3是索引过滤


-- a>0 and b>3    可以看到首次执行是760.866ms,总耗时0.77s,因为a>0数据多,效率低
-- 这里索引访问using idx_t over (0 < a),key_len=5,只用到a>0做索引访问,b>3是icp


点击(此处)折叠或打开

  1. explain analyze
  2. select/*+index(t)*/ * from t where a>0 and b>3\G
  3. *************************** 1. row ***************************
  4. EXPLAIN: -> Index range scan on t using idx_t over (0 < a), with index condition: ((t.a > 0) and (t.b > 3)) (cost=2908883.32 rows=3138143) (actual time=760.866..760.871 rows=2 loops=1)

  5. 1 row in set (0.77 sec)

  6. explain select/*+index(t)*/ * from t where a>0 and b>3;
  7. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  8. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  9. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  10. | 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 5 | NULL | 3138143 | 33.33 | Using index condition; Using MRR |
  11. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  12. 1 row in set, 1 warning (0.00 sec)

-- 写成a>=1 and b>3,则是0.16s,因为2行数据基本通过a=1 and b>3定位,看到索引访问using idx_t over (1 <= a AND 3 < b),
-- key_len从5变成10,用到a,b两个列做索引访问,实际上只用到a=1 and b>3条件
-- 而a>1 and b>3用到a>1,a>1数据少,所以效率高


点击(此处)折叠或打开

  1. explain analyze
  2. select/*+index(t)*/ * from t where a>=1 and b>3\G
  3. *************************** 1. row ***************************
  4. EXPLAIN: -> Index range scan on t using idx_t over (1 <= a AND 3 < b), with index condition: ((t.a >= 1) and (t.b > 3)) (cost=1935631.65 rows=2087428) (actual time=154.424..154.428 rows=2 loops=1)

  5. 1 row in set (0.16 sec)

  6. explain select/*+index(t)*/ * from t where a>=1 and b>3;
  7. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  8. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  9. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  10. | 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 10 | NULL | 2087428 | 33.33 | Using index condition; Using MRR |
  11. +----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
  12. 1 row in set, 1 warning (0.00 sec)

对于pg也是类似的... 所以前导列>=,<=,between,like等值部分能够使用leftmost prefix规则,索引次列可能用到index access来提高效率,比如a>=1 and b>3拆成(a=1 and b>3) or (a>1 and b>3)如果a=1 and b>3能够利用2个列索引访问效率高,而a>1 and b>3只能用到1个列a>1访问的行数(block)少,而a>0 and b>1只能用到a>0索引访问,b>1只能index filter,而a>0的扫描行数远大于a>1的,且a=1 and b>3的选择性好,则a>=1 and b>3的效率比a>0 and b>3的高,实际上还是索引访问的数据块少提高效率。
上一篇:oracle 12c开始分页的三种写法及其下推裁剪优化
下一篇:oracle完全存在性判断逻辑