2.前导列不是等值的选择率计算
下面改下条件,语句如下:点击(此处)折叠或打开
-
select *
-
from tab1 where id between 1000 and 40000 and code =159;
-
-
2 rows selected.
-
- Elapsed: 00:00:00.00
点击(此处)折叠或打开
-
Elapsed: 00:00:00.00
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 2211052296
-
-
--------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | 20 | 103 (1)| 00:00:02 |
-
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 20 | 103 (1)| 00:00:02 |
-
--------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("CODE"=159 AND "ID"<=40000 AND "ID">=1000)
-
-
-
Statistics
-
----------------------------------------------------------
-
1 recursive calls
-
1 db block gets
-
384 consistent gets
-
0 physical reads
-
0 redo size
-
737 bytes sent via SQL*Net to client
-
520 bytes received via SQL*Net from client
-
2 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 2 rows processed
和leftmost prefix规则有关,前导列是范围,后面的列不能用于index access,只能是index filter,所以索引扫描是按照
id between 1000 and 40000,因为索引是按照前导列排序的,所以这个扫描了40%的数据量,走全表也正常。
通过cost=103也看出来全表的比索引的116小。
下面强制走索引分析,可以看到走索引的cost=116比全表扫描103大,所以未走索引,
从执行计划可以看到filter("CODE"=159),虽然access("ID">=1000 AND "CODE"=159 AND "ID"<=40000),但是这里的
"CODE"=159实际不参与索引access,而主要是索引filter:
点击(此处)折叠或打开
-
select/*+index(tab1 idx_tab1)*/ *
-
from tab1 where id between 1000 and 40000 and code =159;
-
2 rows selected.
-
-
Elapsed: 00:00:00.00
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 2722636538
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | 20 | 116 (0)| 00:00:02 |
-
| 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 20 | 116 (0)| 00:00:02 |
-
|* 2 | INDEX RANGE SCAN | IDX_TAB1 | 1 | | 115 (0)| 00:00:02 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("ID">=1000 AND "CODE"=159 AND "ID"<=40000)
-
filter("CODE"=159)
-
-
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
119 consistent gets
-
0 physical reads
-
0 redo size
-
745 bytes sent via SQL*Net to client
-
520 bytes received via SQL*Net from client
-
2 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 2 rows processed
点击(此处)折叠或打开
-
COLUMN_NAME NUM_DISTINCT LOW_VALUE REAL_LOW_VALUE HIGH_VALUE REAL_HIGH_VALUE
-
------------------------------ ------------ -------------------- -------------- -------------------- ---------------
- ID 100000 C102 1 C30B 100000
ix_sel
=effective index selectivity
=sel(id between 1000 and 40000)
=实际范围/总范围+1/num_distinct+1/num_distinct
=(40000-1000)/(100000-1)+1/100000+1/100000
= .3900239
=0.390024
sel(code=159)=1/63904=0.000015648
ix_sel_with_filter用于计算返回行数以及cost,所以要和将id和code条件选择率组合起来,因为是and条件,所以:
ix_sel_with_filter
= .3900239*0.000015648
=.0000061030939872
=.000006
对应总cost=
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)
=1+
ceiling(290*0.390024)+
ceiling(341*.000006)
=1+114+1
=116
注意这里的effective index selectivity按照索引访问条件只有id between 1000 and 40000,虽然执行计划里是:
access("ID">=1000 AND "CODE"=159 AND "ID"<=40000),这是显示的问题,要看后面的filter("CODE"=159),说明
"CODE"=159是index filtert条件,符合索引访问leftmost prefix规则,前导列code是非等值,后面的列id不能参与索引访问,
只能是index filter。
所以effective index selectivity按照id between 1000 and 40000计算,而不是前面的组合条件选择率,
所以索引访问cost=
blevel +
ceiling(leaf_blocks * effective index selectivity)
=1+
ceiling(290*0.390024)
=1+114
=115
这里的0.390024只是id选择率。
而执行计划里的rows则是对应条件组合的结果,包括了code过滤条件,所以返回行数=
rows=round(100000*.390125601*0.000015648)=1
对应10053:
点击(此处)折叠或打开
-
Access Path: index (RangeScan)
-
Index: IDX_TAB1
-
resc_io: 116.00 resc_cpu: 8626877
-
ix_sel: 0.390024 ix_sel_with_filters: 0.000006
-
Cost: 116.41 Resp: 116.41 Degree: 1
-
Best:: AccessPath: TableScan
- Cost: 103.17 Degree: 1 Resp: 103.17 Card: 0.61 Bytes: 0
而rows是按照整个条件计算的结果。
也就是说,执行计划里省略了真正扫描的行数(rows examined),这里扫描的行数是按照 id between 1000 and 40000 查找的,
因为按照id between 1000 and 40000行数接近总行数40%,如下:
点击(此处)折叠或打开
-
select count(*) from tab1 where id between 1000 and 40000;
-
COUNT(*)
-
----------
- 39001
返回3.9w行,扫描过程中按照code =159过滤,所以cost比前面id等值的大很多,所以索引效率要看真正参与index access的索引效率,
等值且选择性好的放前面,这样减少真正access访问的行数(也即减少IO),减少index filter数目提高效率。
可以再看个例子,上面的2个例子回表之后没有额外条件,下面这个例子加上ext用于回表后的过滤
(不是ix_sel_with_filters,而是回表后的condition filter selectivity)。
SQL如下:
点击(此处)折叠或打开
-
select/*+test*/ *
- from tab1 where id between 1000 and 40000 and code >=159 and ext like 'test5%';
点击(此处)折叠或打开
-
SINGLE TABLE ACCESS PATH
-
Single Table Cardinality Estimation for TAB1[TAB1]
-
-
Column (#3): EXT(
-
AvgLen: 10 NDV: 100000 Nulls: 0 Density: 0.000010
-
Column (#1): ID(
-
AvgLen: 5 NDV: 100000 Nulls: 0 Density: 0.000010 Min: 1 Max: 100000
-
Column (#2): CODE(
-
AvgLen: 5 NDV: 63904 Nulls: 0 Density: 0.000016 Min: 2 Max: 99998
-
Table: TAB1 Alias: TAB1
-
Card: Original: 100000.000000 Rounded: 4736 Computed: 4736.41 Non Adjusted: 4736.41
-
Access Path: TableScan
-
Cost: 103.48 Resp: 103.48 Degree: 0
-
Cost_io: 102.00 Cost_cpu: 31119062
-
Resp_io: 102.00 Resp_cpu: 31119062
-
kkofmx: index filter:"TAB1"."CODE">=159
-
-
Access Path: index (skip-scan)
-
SS sel: 0.389418 ANDV (#skips): 39003.000000
-
SS io: 290.000000 vs. index scan io: 114.000000
-
Skip Scan rejected
-
Access Path: index (RangeScan)
-
Index: IDX_TAB1
-
resc_io: 248.00 resc_cpu: 20859874
-
ix_sel: 0.390024 ix_sel_with_filters: 0.389418
-
Cost: 248.99 Resp: 248.99 Degree: 1
-
Best:: AccessPath: TableScan
- Cost: 103.48 Degree: 1 Resp: 103.48 Card: 4736.41 Bytes: 0
点击(此处)折叠或打开
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 2211052296
-
-
--------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 4736 | 94720 | 103 (1)| 00:00:02 |
-
|* 1 | TABLE ACCESS FULL| TAB1 | 4736 | 94720 | 103 (1)| 00:00:02 |
-
--------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("ID"<=40000 AND "EXT" LIKE 'test5%' AND "ID">=1000 AND
- "CODE">=159)
点击(此处)折叠或打开
-
select/*+index(tab1)*/ *
-
from tab1 where id between 1000 and 40000 and code >=159 and ext like 'test5%';
-
-
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 2722636538
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 4736 | 94720 | 249 (1)| 00:00:03 |
-
|* 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 4736 | 94720 | 249 (1)| 00:00:03 |
-
|* 2 | INDEX RANGE SCAN | IDX_TAB1 | 38942 | | 115 (0)| 00:00:02 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("EXT" LIKE 'test5%')
-
2 - access("ID">=1000 AND "CODE">=159 AND "ID"<=40000 AND "CODE" IS NOT NULL)
- filter("CODE">=159)
点击(此处)折叠或打开
-
CODE列统计信息如下:
-
COLUMN_NAME NUM_DISTINCT LOW_VALUE REAL_LOW_VALUE HIGH_VALUE REAL_HIGH_VALUE
-
------------------------------ ------------ -------------------- -------------- -------------------- ---------------
- CODE 63904 C103 2 C30A6463 99998
ix_sel
=effective index selectivity
=sel(id between 1000 and 40000)
=实际范围/总范围+1/num_distinct+1/num_distinct
=(40000-1000)/(100000-1)+1/100000+1/100000
= .3900239
=0.390024
sel(code >=159)=实际范围/总范围+1/num_distinct
=(99998-159)/(99998-2)+1/63904
= .998445586
则ix_sel_with_filter
= .3900239*.998445586
= 0.389418
对应总cost=
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)
=1+
ceiling(290*0.390024)+
ceiling(341*0.389418 )
=1+114+133
=248
这里的248比执行计划里的249少了1,可能是没有考虑cpu cost的原因。
1+ceiling(290*0.390024)=115
回表的按照ix_sel_filters计算,貌似不包括回表的过滤条件"EXT" LIKE 'test5%',
也就是回表cost=ceiling(clustering_factor * effective table selectivity)
=ceiling(341*0.389418 )
=133
ID=2返回行是包括index filter条件的,因此返回的行数=
ix_sel_with_filter*num_rows
=round(0.389418*100000)
=38942
回表过滤条件,不是ix_sel_filter,是单纯的索引回表后非索引访问条件的过滤,貌似cost没有包含这个过滤条件,是按照ix_sel_filter计算的。
但是对于返回行数,要考虑回表过滤条件选择率,也即"EXT" LIKE 'test5%'的选择率sel_ext,sel_ext*index_rows=返回rows
可以看下EXT列统计信息:
点击(此处)折叠或打开
-
COLUMN_NAME NUM_DISTINCT LOW_VALUE REAL_LOW_VALUE HIGH_VALUE REAL_HIGH_VALUE
-
-------------------- ------------ -------------------- -------------------- -------------------- --------------------
- EXT 100000 7465737431 test1 746573743939393939 test99999
对于like后通配,实际上内部是将like转为>= and <来计算:
where "EXT" LIKE 'test5%'
按照下面条件计算:
"EXT" >= 'test5' and "EXT" < 'test6'
实际上不是转为test6,像mysql是转为>='test5\0\0\0\0\0\...' and <= 'test5 ...'
然后选择率计算要将字符串转为数字,这里使用get_internal_value函数:
点击(此处)折叠或打开
-
select get_internal_value('test5'),get_internal_value('test6') from dual;
-
-
GET_INTERNAL_VALUE('TEST5')
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
GET_INTERNAL_VALUE('TEST6')
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
604364106163032000000000000000000000
- 604364106164241000000000000000000000
具体的字符串如何转为数字的:
1)右侧补0,保留到15个字节
2)转为16进制
3)round左起21位
如果不用get_internal_value,也可以按照上面三点自己编写语句:
点击(此处)折叠或打开
-
select to_char(round(to_number(utl_raw.cast_to_raw(rpad('test5',15,chr(0))),'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),-21)) dec_val from dual;
-
DEC_VAL
-
----------------------------------------
- 604364106163032000000000000000000000
点击(此处)折叠或打开
-
dingjun123@ORADB> select get_internal_value('test1'),get_internal_value('test9999') from dual;
-
-
GET_INTERNAL_VALUE('TEST1')
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
GET_INTERNAL_VALUE('TEST9999')
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
604364106158196000000000000000000000
- 604364106168138000000000000000000000
sel=
(604364106164241-604364106163032)/(604364106168138-604364106158196)+1/100000
= 0.121615311
所以返回行数是:round(38942*0.121615)=4736。
注意:如果是like前通配则不一样,可以看到估算返回5000行,对应5%的选择率,这个是默认值,无法转为> and <。
点击(此处)折叠或打开
-
select/*+index(tab1)*/ *
-
from tab1 where ext like '%test5';
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash value: 2211052296
-
-
--------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 5000 | 97K| 104 (2)| 00:00:02 |
-
|* 1 | TABLE ACCESS FULL| TAB1 | 5000 | 97K| 104 (2)| 00:00:02 |
-
--------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
- 1 - filter("EXT" LIKE '%test5' AND "EXT" IS NOT NULL)
总结:
综上所述,要想建的索引效率高,也就是要提高index access的访问效率,减少访问io,其次要减少index filter以及回表的io。具体来说,组合索引要考虑:1)组合索引要将常用且等值访问的列作为前导列,如果均匀分布较好,如果选择性好,则更佳。
因为等值条件后续列还可能参与索引访问(like前通配,<>不行,<,>可以),这样索引访问效率是多列组合结果
2)将选择性好的列放前面,这个指的是条件的选择率,如果列是均匀的且等值访问,可以直接利用列计算条件选择率,
如果分布不均或非等值,则要考虑具体条件选择率,也即条件返回的行数占总行数比例。
3)范围查询一般放后面,比如>,<,between,like后通配,date/timestamp列放后面,因为date/timestamp列一般都是用于范围查询,而非等值查询。
4)创建的索引尽可能地能够消除排序。
5)组合索引的每个列前导列都要按照上面3点考虑,特别是1,2,然后要考虑leftmost prefix规则,让前面的列条件尽可能是等值的,
才能让后续列也能参与索引访问。