先创建表test,并且建立索引,收集统计信息:
|
--create table
create table test as select object_id,object_name,'file1000000000000000000000000xyz_20230215_'||rownum from dba_objects; --create index create index idx_test on test(file_id); --gather statistics exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for all columns size auto',no_invalidate=>false,cascade=>true,degree => 10); exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for columns file_id size skewonly',no_invalidate=>false,cascade=>true,degree => 10); |
下面执行对应的问题语句:
|
select * from test where file_id='file1000000000000000000000000xyz_20230215_999';
1 row selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 1357081020 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 76815 | 5776K| 247 (1)| 00:00:03 | |* 1 | TABLE ACCESS FULL| TEST | 76815 | 5776K| 247 (1)| 00:00:03 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("FILE_ID"='file1000000000000000000000000xyz_20230215_999') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 881 consistent gets 0 physical reads 0 redo size 742 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed |
正常情况下,此语句按照条件file_id查询,应该走索引,因为选择性非常好,选择性如下:
dingjun123@ORADB> select count(*),count(distinct file_id) card from test;
COUNT(*) CARD
---------- ----------
76815 76815
FILE_ID基本唯一,为什么没有走索引呢?可以看到,FILE_ID存储的内容是类似于:'file1000000000000000000000000xyz_20230215_'||rownum,也就是前缀很长且重复,可以想到CBO内部计算选择性是有一定规则的,这么长的重复估计有问题。
另外FILE_ID有直方图:
dingjun123@ORADB> dingjun123@ORADB> select column_name,histogram from dba_tab_col_statistics where table_name='TEST';
COLUMN_NAME HISTOGRAM
------------------------------ ---------------
FILE_ID FREQUENCY
OBJECT_NAME NONE
OBJECT_ID NONE
NAME NONE
去MOS搜索下:
Statistics and histograms of character columns with length longer than 32 or 64 characters (Doc ID 800089.1)
dingjun123@ORADB> select count(*),count(distinct substrb(file_id,1,32)) card from test;
COUNT(*) CARD
---------- ----------
76815 1
这样看,不走索引是正常的,为了解决这个问题,对选择性好的,要删除直方图:
|
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for columns file_id size 1',no_invalidate=>false,cascade=>true,degree => 10);
删除直方图后正确: Execution Plan ---------------------------------------------------------- Plan hash value: 2473784974 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 77 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 77 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_TEST | 1 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("FILE_ID"='file1000000000000000000000000xyz_20230215_999') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 746 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed |