直方图只存储字符串前32字节的限制-dingjun123-ChinaUnix博客

某天，遇到一个紧急SQL性能问题，这条SQL的执行频次非常高，下面模拟下此问题：
先创建表test,并且建立索引，收集统计信息：

--create table
create table test as select object_id,object_name,'file1000000000000000000000000xyz_20230215_'||rownum from dba_objects;

--create index
create index idx_test on test(file_id);

--gather statistics
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for all columns size auto',no_invalidate=>false,cascade=>true,degree => 10);
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for columns file_id size skewonly',no_invalidate=>false,cascade=>true,degree => 10);

下面执行对应的问题语句：

select * from test where file_id='file1000000000000000000000000xyz_20230215_999';

1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 76815 | 5776K| 247 (1)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 76815 | 5776K| 247 (1)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FILE_ID"='file1000000000000000000000000xyz_20230215_999')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
881 consistent gets
0 physical reads
0 redo size
742 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

正常情况下，此语句按照条件file_id查询，应该走索引，因为选择性非常好，选择性如下：
dingjun123@ORADB> select count(*),count(distinct file_id) card from test;
COUNT(*) CARD
---------- ----------
76815 76815

FILE_ID基本唯一，为什么没有走索引呢？可以看到，FILE_ID存储的内容是类似于：'file1000000000000000000000000xyz_20230215_'||rownum,也就是前缀很长且重复，可以想到CBO内部计算选择性是有一定规则的，这么长的重复估计有问题。

另外FILE_ID有直方图：
dingjun123@ORADB> dingjun123@ORADB> select column_name,histogram from dba_tab_col_statistics where table_name='TEST';

COLUMN_NAME HISTOGRAM
------------------------------ ---------------
FILE_ID FREQUENCY
OBJECT_NAME NONE
OBJECT_ID NONE
NAME NONE

去MOS搜索下：
Statistics and histograms of character columns with length longer than 32 or 64 characters (Doc ID 800089.1)

在12c之前，直方图只存储前32字节，12c及之后是存储前64字节：
dingjun123@ORADB> select count(*),count(distinct substrb(file_id,1,32)) card from test;

COUNT(*) CARD
---------- ----------
76815 1

这样看，不走索引是正常的，为了解决这个问题，对选择性好的，要删除直方图：

exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'dingjun123',tabname=>'test',estimate_percent=>null,method_opt=>'for columns file_id size 1',no_invalidate=>false,cascade=>true,degree => 10);

删除直方图后正确：
Execution Plan
----------------------------------------------------------
Plan hash value: 2473784974

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 77 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 77 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TEST | 1 | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("FILE_ID"='file1000000000000000000000000xyz_20230215_999')

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
746 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed