构造数据如下:
DROP TABLE t;
CREATE TABLE t
AS
SELECT LEVEL ID,
trunc(100 * dbms_random.normal) val,
rpad('1',10,'1') padding
FROM dual
CONNECT BY LEVEL<1000000;
SQL> select padding,count(*) from t group by padding;
PADDING COUNT(*)
---------- ----------
1111111111 999999
2345678 1
123456 1
56789 1
--构造skew data
INSERT INTO t VALUES(1000000,12345,'2345678');
INSERT INTO t VALUES(1000001,12345,'123456');
INSERT INTO t VALUES(1000001,12345,'56789');
--创建索引
CREATE INDEX idx_t ON t(padding);
1.采用dbms_stats收集,参数采用默认值
SQL> explain plan for
2 SELECT * FROM t WHERE padding LIKE '56%';
已解释。
SQL> @display
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250K| 4882K| 971 (2)| 00:00:12 |
|* 1 | TABLE ACCESS FULL| T | 250K| 4882K| 971 (2)| 00:00:12 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("PADDING" LIKE '56%')
已选择13行。
很显然,ORACLE CBO估算"PADDING" LIKE '56%'有25w行数据,实际值有1行。显然应该走索引,但是CBO没有走索引。
查询得知,有4个num_distinct,条件的选择性是25%,无直方图,所以CBO估算错误。
SQL> SELECT column_name,num_distinct,low_value,high_value,density,sample_size,histogram FROM user_tab_columns WHERE table_name='T';
COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ----------- ---------------
ID 1000002 C102 C402010102 9.99998000 1000002 NONE
VAL 843 3D601D66 C302182E 0.00118623 1000002 NONE
PADDING 4 31313131313131313131 3536373839 0.25 1000002 NONE
2.收集直方图信息
SQL> EXEC dbms_stats.gather_table_stats(ownname => USER,tabname => 'T',estimate_percent => 100,method_opt => 'for columns padding size 10',cascade => TRUE);
PL/SQL 过程已成功完成。
SQL> explain plan for
2 SELECT * FROM t WHERE padding LIKE '56%';
已解释。
SQL> @display
Plan hash value: 1594971208
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 20 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("PADDING" LIKE '56%')
filter("PADDING" LIKE '56%')
已选择15行。
SQL> SELECT column_name,num_distinct,low_value,high_value,density,sample_size,histogram FROM user_tab_columns WHERE table_name='T';
COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ----------- ---------------
ID 1000002 C102 C402010102 9.99998000 1000002 NONE
VAL 843 3D601D66 C302182E 0.00118623 1000002 NONE
PADDING 4 31313131313131313131 3536373839 4.99999000 1000002 FREQUENCY
通过执行计划分析CBO走的计划是不是正确,要结合CBO的访问路径、JOIN方式、对应的谓词信息来进行分析,然后查询相关数据字典,看是否给CBO需要的统计信息,如果统计信息不正确,
CBO很可能会走错计划。
直方图的问题是很复杂的问题,涉及到绑定变量,cursor_sharing参数,11g cardinality feedback,ACS等特性,特别ORACLE直方图对于字符类型还有32字节的限制,如果前32字节都是一样的,那么也可能会导致问题,另外收集统计信息的算法是很复杂的,使用默认参数一般都是在比较均匀的数据分布中使用,涉及到直方图的问题,默认参数很可能会导致问题。