1)a,b列组合索引
2)a是整数,不能a>0改成a>=0,那么就不等价了,就算结果一样,索引访问的数据量也没有变化
3)a>0的数据量多,结果很多都在a=1 and b>3上,而a>1的数据量少
组合索引要符合leftmost prefix规则,前导列是范围的,后续列条件不能index access,只能index filter...
但是貌似有例外:
MySQL的like后通配,between,>=,<=貌似会用到后续列,因为他们包含了一个等值条件,
比如index(a,b),对于where a>=1 and b>3,“如果a是整数”
然后索引访问实际上是拆成两份:a=1 and b>3 和a>1 and b>3,前者能真正用到两个列索引访问,后者只能用到a>1
那么,如果存在a>1的数据很少,a=1的数据多,但是a=1 and b>3的数据少,
那么where a>=1 and b>3就比a>0 and b>3的效率高,因为写成
a>0 and b>3则只能用到a>0,而a>0是等价于a>=1的。。。但是实际上访问效率不一样,
注意:a>0和a>=0是不等价的或效率没有啥区别,必须a是整数,a>0写成a>=1,且要满足上述数据分布才能提升效率。
可以通过下面在三大库上测试看出效率区别。
构造测试数据:
点击(此处)折叠或打开
-
drop table t;
-
create table t
-
(
-
a int,
-
b int,
-
ext int
-
);
-
-
-- a=1的数据多,a>1的数据少
-
insert into t values(0,2,1);
-
insert into t values(1,2,1);
-
insert into t values(1,2,1);
-
insert into t values(1,2,1);
-
insert into t values(1,2,1);
-
insert into t values(2,2,1);
-
-
-- 插入300w行+
-
insert into t select * from t;
-
insert into t select * from t;
-
insert into t select * from t;
-
insert into t select * from t;
-
insert into t select * from t;
-
-
-- 再插入2条满足条件的行
-
insert into t values(1,5,1);
-
insert into t values(2,10,1);
-
commit;
-
-
-- 创建组合索引
- create index idx_t on t(a,b);
--- oracle测试
=== 如果查询的数据集中在某个前导列值上,而且返回行很少,
=== 前导列其他满足的条件行数少,比如a=1 and b>3直接定位行数少
== a>0 and b>3只能用到a>0索引访问,那么基本包含所有数据,默认不走索引,这里强制走索引,逻辑读12430
点击(此处)折叠或打开
-
set autotrace traceonly
-
select/*+index(t)*/ * from t where a>0 and b>3;
-
Elapsed: 00:00:00.37
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 120220905
-
-
---------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
---------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
-
---------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("A">0 AND "B">3 AND "A" IS NOT NULL)
-
filter("B">3)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
12430 consistent gets
-
0 physical reads
-
0 redo size
-
761 bytes sent via SQL*Net to client
-
412 bytes received via SQL*Net from client
-
2 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 2 rows processed
-- 将 a>0改成a>=1,那么a=1 and b>3部分用到2个列索引访问,定位2行数据,效率高,而a>1的数据少,那么a>1 and b>3部分效率也可以
-- 逻辑读从12430到2491
-- 虽然O的执行计划还是b>3是index filter...但是access里隐含了a=1 and b>3直接访问
点击(此处)折叠或打开
-
select/*+index(t)*/ * from t where a>=1 and b>3;
-
-
2 rows selected.
-
-
Elapsed: 00:00:00.08
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 120220905
-
-
---------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
---------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 134 | 5226 | 12821 (1)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 12821 (1)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_T | 57633 | | 12550 (1)| 00:00:01 |
-
---------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("A">=1 AND "B">3 AND "A" IS NOT NULL)
-
filter("B">3)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
2491 consistent gets
-
0 physical reads
-
0 redo size
-
761 bytes sent via SQL*Net to client
-
409 bytes received via SQL*Net from client
-
2 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 2 rows processed
--类似将a>=1拆成a=1和a>1两条语句执行:
点击(此处)折叠或打开
-
select/*+index(t)*/ * from t where a=1 and b>3
-
union all
-
select/*+index(t)*/ * from t where a>1 and b>3;
-
-
2 rows selected.
-
-
Elapsed: 00:00:00.10
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 230998067
-
-
----------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 135 | 5265 | 2572 (1)| 00:00:01 |
-
| 1 | UNION-ALL | | | | | |
-
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 39 | 4 (0)| 00:00:01 |
-
|* 3 | INDEX RANGE SCAN | IDX_T | 1 | | 3 (0)| 00:00:01 |
-
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 134 | 5226 | 2568 (1)| 00:00:01 |
-
|* 5 | INDEX RANGE SCAN | IDX_T | 11534 | | 2514 (1)| 00:00:01 |
-
----------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access("A"=1 AND "B">3 AND "B" IS NOT NULL)
-
5 - access("A">1 AND "B">3 AND "A" IS NOT NULL)
-
filter("B">3)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
2494 consistent gets
-
0 physical reads
-
0 redo size
-
753 bytes sent via SQL*Net to client
-
465 bytes received via SQL*Net from client
-
2 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 2 rows processed
-- mysql
-- mysql类似oracle的访问模式,而且前导列a>=1 and b>3,普通计划的key_len会变化,树形计划的access部分会用到2个条件
-- 实际上只能a=1 and b>3部分才能用到2个列索引访问,而a>1 and b>3部分只能用到a>1部分索引访问,而b>3是索引过滤
-- a>0 and b>3 可以看到首次执行是760.866ms,总耗时0.77s,因为a>0数据多,效率低
-- 这里索引访问using idx_t over (0 < a),key_len=5,只用到a>0做索引访问,b>3是icp
点击(此处)折叠或打开
-
explain analyze
-
select/*+index(t)*/ * from t where a>0 and b>3\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Index range scan on t using idx_t over (0 < a), with index condition: ((t.a > 0) and (t.b > 3)) (cost=2908883.32 rows=3138143) (actual time=760.866..760.871 rows=2 loops=1)
-
-
1 row in set (0.77 sec)
-
-
explain select/*+index(t)*/ * from t where a>0 and b>3;
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
-
| 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 5 | NULL | 3138143 | 33.33 | Using index condition; Using MRR |
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
- 1 row in set, 1 warning (0.00 sec)
-- 写成a>=1 and b>3,则是0.16s,因为2行数据基本通过a=1 and b>3定位,看到索引访问using idx_t over (1 <= a AND 3 < b),
-- key_len从5变成10,用到a,b两个列做索引访问,实际上只用到a=1 and b>3条件
-- 而a>1 and b>3用到a>1,a>1数据少,所以效率高
点击(此处)折叠或打开
-
explain analyze
-
select/*+index(t)*/ * from t where a>=1 and b>3\G
-
*************************** 1. row ***************************
-
EXPLAIN: -> Index range scan on t using idx_t over (1 <= a AND 3 < b), with index condition: ((t.a >= 1) and (t.b > 3)) (cost=1935631.65 rows=2087428) (actual time=154.424..154.428 rows=2 loops=1)
-
-
1 row in set (0.16 sec)
-
-
explain select/*+index(t)*/ * from t where a>=1 and b>3;
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
-
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
-
| 1 | SIMPLE | t | NULL | range | idx_t | idx_t | 10 | NULL | 2087428 | 33.33 | Using index condition; Using MRR |
-
+----+-------------+-------+------------+-------+---------------+-------+---------+------+---------+----------+----------------------------------+
- 1 row in set, 1 warning (0.00 sec)
对于pg也是类似的... 所以前导列>=,<=,between,like等值部分能够使用leftmost prefix规则,索引次列可能用到index access来提高效率,比如a>=1 and b>3拆成(a=1 and b>3) or (a>1 and b>3)如果a=1 and b>3能够利用2个列索引访问效率高,而a>1 and b>3只能用到1个列a>1访问的行数(block)少,而a>0 and b>1只能用到a>0索引访问,b>1只能index filter,而a>0的扫描行数远大于a>1的,且a=1 and b>3的选择性好,则a>=1 and b>3的效率比a>0 and b>3的高,实际上还是索引访问的数据块少提高效率。