oracle使用扩展列统计信息解决复杂谓词cardinality估算不准的问题

80阅读 0评论2025-04-26 dingjun123
分类:Oracle

对于使用了函数或表达式运算会导致统计信息用不上、and条件列组相关性紧密,这些情况下的cardinality可能计算不准,导致多表JOIN的join order、join method选择不对,从而走错执行计划,这时候可以使用扩展列统计信息来优化cardinality计算。


建表如下:

点击(此处)折叠或打开

  1. drop table extend_stats;
  2. create table extend_stats
  3. as
  4. select * from dba_objects;

  5. select num_rows from dba_tab_statistics where table_name='EXTEND_STATS';

  6.   NUM_ROWS
  7. ----------
  8.      73769

对object_id和created使用函数运算,并使用or条件:    


点击(此处)折叠或打开

  1. select *
  2. from extend_stats a
  3. where to_char(object_id) < '1000'
  4. or to_char(created,'yyyy-mm-dd')='2025-3-1';


  5. Plan hash value: 1446351961

  6. -----------------------------------------------------------------------------------------------------
  7. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
  8. -----------------------------------------------------------------------------------------------------
  9. | 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.07 | 1436 | 1431 |
  10. |* 1 | TABLE ACCESS FULL| EXTEND_STATS | 1 | 4389 | 2 |00:00:00.07 | 1436 | 1431 |
  11. -----------------------------------------------------------------------------------------------------

  12. Predicate Information (identified by operation id):
  13. ---------------------------------------------------

  14.    1 - filter((TO_CHAR("OBJECT_ID")<'1000' OR TO_CHAR(INTERNAL_FUNCTION("CREATED"),'yyyy-mm-d
  15.               d')='2025-3-1'))

通过执行计划看出,估算行数E-Rows vs真实行数A-Rows是4389 vs 2,CBO估算的cardinality非常不准确?


以上or的选择率如何计算{BANNED}最佳后得到4389行呢?先单独计算,然后做OR计算
因为没有函数索引,且函数没有统计信息,范围的是5%,等值的是1%


所以{BANNED}最佳终or选择率= ((0.05+0.01)-0.05*0.01)=0.0595
{BANNED}最佳终估算行数=0.0595*73769=4389


而实际返回行数=2,所以估算不准,如果是多表join的话,估算不准可能走错执行计划,本来要走nl的可能
走成hash join,
那么如何让这种复杂的统计信息变的准呢?


1)收集扩展列统计信息,对于函数的不能用列组?列组指的是and,or只能单独收集
对于列组只能是原始列用逗号隔开,可以指定>=2个


点击(此处)折叠或打开

  1. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (to_char(object_id))]',no_invalidate=>false);

  2. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (to_char(created,'yyyy-mm-dd'))]',no_invalidate=>false);

收集扩展列统计信息后就比较准确了估算的是16行,与真实的2差距很小,对于or只能单独收集,
and的可以是列组,但是列组不能包含函数或表达式


点击(此处)折叠或打开

  1. select *
  2. from extend_stats a
  3. where to_char(object_id) < '1000'
  4. or to_char(created,'yyyy-mm-dd')='2025-3-1';

  5. SQL_ID 55b317t4ax0sf, child number 0
  6. -------------------------------------
  7. select * from extend_stats a where to_char(object_id) < '1000' or
  8. to_char(created,'yyyy-mm-dd')='2025-3-1'

  9. Plan hash value: 1446351961

  10. --------------------------------------------------------------------------------------------
  11. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  12. --------------------------------------------------------------------------------------------
  13. | 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.04 | 1436 |
  14. |* 1 | TABLE ACCESS FULL| EXTEND_STATS | 1 | 16 | 2 |00:00:00.04 | 1436 |
  15. --------------------------------------------------------------------------------------------

  16. Predicate Information (identified by operation id):
  17. ---------------------------------------------------

  18.    1 - filter((TO_CHAR("OBJECT_ID")<'1000' OR
  19.               TO_CHAR(INTERNAL_FUNCTION("CREATED"),'yyyy-mm-dd')='2025-3-1'))

扩展列统计信息收集的一些注意点:
--不可以,列组包含表达式

点击(此处)折叠或打开

  1. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (object_id+1,created)]',no_invalidate=>false);

--单独的函数,表达式可以使用扩展列统计信息,包括case when,decode表达式

点击(此处)折叠或打开

  1. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (object_id+1)]',no_invalidate=>false);
  2. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (case when object_id=1 then 1 else 0 end)]',no_invalidate=>false);

-- 原始列条件的组合,可以指定多个

点击(此处)折叠或打开

  1. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'extend_stats',method_opt=>q'[for columns (object_id,created,status)]',no_invalidate=>false);

扩展列统计信息在哪看?

点击(此处)折叠或打开

  1. select extension_name,extension from DBA_STAT_EXTENSIONS where table_name='EXTEND_STATS';

  2. EXTENSION_NAME
  3. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  4. EXTENSION
  5. --------------------------------------------------------------------------------
  6. SYS_STUJ6XT0DF4L9#$F#3EB6C4KKZ
  7. ("OBJECT_ID","CREATED","STATUS")

  8. SYS_STUYGZS474M7JCSQFWO9DK9Z4Z
  9. ("OBJECT_ID"+1)

  10. SYS_STUF6$OC2$_RA6FBNFXO8ZA1FG
  11. (CASE "OBJECT_ID" WHEN 1 THEN 1 ELSE 0 END )

扩展列统计信息操作:

点击(此处)折叠或打开

  1. --删除列组
  2. exec dbms_stats.drop_extended_stats(user,'extend_stats','(OBJECT_ID,CREATED,STATUS)');
  3. --删除列组统计信息
  4. exec dbms_stats.delete_column_stats(user,'extend_stats','SYS_STUYGZS474M7JCSQFWO9DK9Z4Z');
  5. exec dbms_stats.delete_column_stats(user,'extend_stats','SYS_STUF6$OC2$_RA6FBNFXO8ZA1FG');

  6. --
  7. exec dbms_stats.drop_extended_stats(user,'extend_stats','(OBJECT_ID,CREATED,STATUS)');
  8. exec dbms_stats.drop_extended_stats(user,'extend_stats','("OBJECT_ID"+1)');
  9. exec dbms_stats.drop_extended_stats(user,'extend_stats','(CASE "OBJECT_ID" WHEN 1 THEN 1 ELSE 0 END )');


快速检测谓词和列组:

点击(此处)折叠或打开

  1. create table group_t(country number,city number);
  2. insert into group_t values(1,1);
  3. insert into group_t values(1,2);
  4. insert into group_t values(1,3);
  5. insert into group_t values(2,1);


  6. EXEC DBMS_STATS.SEED_COL_USAGE(null,null,300);
  7. explain plan for
  8. select *
  9. from group_t a
  10. where country=1
  11. and city=2;

自动检测结果如下:

点击(此处)折叠或打开

  1. SET LONG 100000
  2. SET LINES 120
  3. SET PAGES 0
  4. SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'group_t')
  5. FROM DUAL;

  6. LEGEND:
  7. .......

  8. EQ : Used in single table EQuality predicate
  9. RANGE : Used in single table RANGE predicate
  10. LIKE : Used in single table LIKE predicate
  11. NULL : Used in single table is (not) NULL predicate
  12. EQ_JOIN : Used in EQuality JOIN predicate
  13. NONEQ_JOIN : Used in NON EQuality JOIN predicate
  14. FILTER : Used in single table FILTER predicate
  15. JOIN : Used in JOIN predicate
  16. GROUP_BY : Used in GROUP BY expression
  17. ...............................................................................

  18. ###############################################################################

  19. COLUMN USAGE REPORT FOR DINGJUN123.GROUP_T
  20. ..........................................

  21. 1. CITY : EQ
  22. 2. COUNTRY : EQ
  23. 3. (COUNTRY, CITY) : FILTER
  24. ###############################################################################





上一篇:oracle完全存在性判断逻辑
下一篇:Oracle 12c index by rowid batched特性