oracle,mysql组合索引前导列范围条件且包含等值条件，可能效率只有范围的更高-dingjun123-ChinaUnix博客

将a>0 and b>3这种改成a>=1 and b>3提高效率必须满足特定条件：
1）a,b列组合索引
2）a是整数，不能a>0改成a>=0,那么就不等价了，就算结果一样，索引访问的数据量也没有变化
3）a>0的数据量多，结果很多都在a=1 and b>3上，而a>1的数据量少

组合索引要符合leftmost prefix规则，前导列是范围的，后续列条件不能index access,只能index filter...
但是貌似有例外：
MySQL的like后通配,between,>=,<=貌似会用到后续列，因为他们包含了一个等值条件，
比如index(a,b),对于where a>=1 and b>3，“如果a是整数”
然后索引访问实际上是拆成两份：a=1 and b>3 和a>1 and b>3，前者能真正用到两个列索引访问，后者只能用到a>1
那么，如果存在a>1的数据很少，a=1的数据多，但是a=1 and b>3的数据少，
那么where a>=1 and b>3就比a>0 and b>3的效率高，因为写成
a>0 and b>3则只能用到a>0，而a>0是等价于a>=1的。。。但是实际上访问效率不一样，

注意：a>0和a>=0是不等价的或效率没有啥区别，必须a是整数，a>0写成a>=1,且要满足上述数据分布才能提升效率。
可以通过下面在三大库上测试看出效率区别。

构造测试数据：

点击(此处)折叠或打开

drop table t;
create table t
(
a int,
b int,
ext int
);
-- a=1的数据多，a>1的数据少
insert into t values(0,2,1);
insert into t values(1,2,1);
insert into t values(1,2,1);
insert into t values(1,2,1);
insert into t values(1,2,1);
insert into t values(2,2,1);
-- 插入300w行+
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
-- 再插入2条满足条件的行
insert into t values(1,5,1);
insert into t values(2,10,1);
commit;
-- 创建组合索引
create index idx_t on t(a,b);

--- oracle测试
=== 如果查询的数据集中在某个前导列值上，而且返回行很少，
=== 前导列其他满足的条件行数少，比如a=1 and b>3直接定位行数少

== a>0 and b>3只能用到a>0索引访问，那么基本包含所有数据，默认不走索引，这里强制走索引，逻辑读12430

点击(此处)折叠或打开

set autotrace traceonly
select/*+index(t)*/ * from t where a>0 and b>3;
Elapsed: 00:00:00.37
Execution Plan
----------------------------------------------------------
Plan hash value: 120220905
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A">0 AND "B">3 AND "A" IS NOT NULL)
filter("B">3)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12430 consistent gets
0 physical reads
0 redo size
761 bytes sent via SQL*Net to client
412 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

-- 将 a>0改成a>=1,那么a=1 and b>3部分用到2个列索引访问，定位2行数据，效率高，而a>1的数据少，那么a>1 and b>3部分效率也可以
-- 逻辑读从12430到2491
-- 虽然O的执行计划还是b>3是index filter...但是access里隐含了a=1 and b>3直接访问

点击(此处)折叠或打开

select/*+index(t)*/ * from t where a>=1 and b>3;
2 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 120220905
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A">=1 AND "B">3 AND "A" IS NOT NULL)
filter("B">3)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2491 consistent gets
0 physical reads
0 redo size
761 bytes sent via SQL*Net to client
409 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

--类似将a>=1拆成a=1和a>1两条语句执行：

点击(此处)折叠或打开

select/*+index(t)*/ * from t where a=1 and b>3
union all
select/*+index(t)*/ * from t where a>1 and b>3;
2 rows selected.
Elapsed: 00:00:00.10
Execution Plan
----------------------------------------------------------
Plan hash value: 230998067
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 135 | 5265 | 2572 (1)| 00:00:01 |
| 1 | UNION-ALL | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 39 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_T | 1 | | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 2568 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_T | 11534 | | 2514 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A"=1 AND "B">3 AND "B" IS NOT NULL)
5 - access("A">1 AND "B">3 AND "A" IS NOT NULL)
filter("B">3)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2494 consistent gets
0 physical reads
0 redo size
753 bytes sent via SQL*Net to client
465 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

-- mysql
-- mysql类似oracle的访问模式，而且前导列a>=1 and b>3,普通计划的key_len会变化，树形计划的access部分会用到2个条件
-- 实际上只能a=1 and b>3部分才能用到2个列索引访问，而a>1 and b>3部分只能用到a>1部分索引访问，而b>3是索引过滤

-- a>0 and b>3 可以看到首次执行是760.866ms,总耗时0.77s，因为a>0数据多，效率低
-- 这里索引访问using idx_t over (0 < a)，key_len=5,只用到a>0做索引访问，b>3是icp

点击(此处)折叠或打开

explain analyze
select/*+index(t)*/ * from t where a>0 and b>3\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on t using idx_t over (0 < a), with index condition: ((t.a > 0) and (t.b > 3)) (cost=2908883.32 rows=3138143) (actual time=760.866..760.871 rows=2 loops=1)
1 row in set (0.77 sec)
explain select/*+index(t)*/ * from t where a>0 and b>3;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
| 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 5 | NULL | 3138143 | 33.33 | Using index condition; Using MRR |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)

-- 写成a>=1 and b>3,则是0.16s,因为2行数据基本通过a=1 and b>3定位，看到索引访问using idx_t over (1 <= a AND 3 < b),
-- key_len从5变成10，用到a,b两个列做索引访问，实际上只用到a=1 and b>3条件
-- 而a>1 and b>3用到a>1,a>1数据少，所以效率高

点击(此处)折叠或打开

explain analyze
select/*+index(t)*/ * from t where a>=1 and b>3\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on t using idx_t over (1 <= a AND 3 < b), with index condition: ((t.a >= 1) and (t.b > 3)) (cost=1935631.65 rows=2087428) (actual time=154.424..154.428 rows=2 loops=1)
1 row in set (0.16 sec)
explain select/*+index(t)*/ * from t where a>=1 and b>3;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
| 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 10 | NULL | 2087428 | 33.33 | Using index condition; Using MRR |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)

对于pg也是类似的... 所以前导列>=,<=,between,like等值部分能够使用leftmost prefix规则，索引次列可能用到index access来提高效率，比如a>=1 and b>3拆成(a=1 and b>3) or (a>1 and b>3)如果a=1 and b>3能够利用2个列索引访问效率高，而a>1 and b>3只能用到1个列a>1访问的行数(block)少，而a>0 and b>1只能用到a>0索引访问，b>1只能index filter,而a>0的扫描行数远大于a>1的，且a=1 and b>3的选择性好，则a>=1 and b>3的效率比a>0 and b>3的高，实际上还是索引访问的数据块少提高效率。