CBO连接的成本计算
本文讨论Nested Loop(简称NL)和Hash Join(HJ)算法的成本,以及_optimizer_cost_model参数的影响,
该参数默认为chose表示考虑cpu因素,为了实验,作者设置为IO表示只考虑io成本
SQL> alter system set "_optimizer_cost_model"='IO';
System altered
从NL的算法来看,先扫描第一表,然后根据连接条件,把一个表的连接栏位的条件,放到第二个表中去做比较,
因此可以看出NL的成本为:第一个数据的成本*(第二个数据成本的*第一个数据集的行数)
准备了如下测试数据
create table T1
(
object_id NUMBER,
object_name VARCHAR2(128)
);
create index IDX_T1 on T1 (OBJECT_ID);
create table T2
(
object_id NUMBER,
object_name VARCHAR2(128)
);
create index IDX_T2 on T2 (OBJECT_ID);
insert into t1 select * from dba_objects;
insert into t2 select * from dba_objects;
收集统计信息
首先考察了NL的连接方式成本计算.
SQL> select /*+ USE_NL(t1 t2) */ count(t1.object_name)
from t1,t2
where t1.object_id=t2.object_id 2 3 ;
COUNT(T1.OBJECT_NAME)
---------------------
90924
Execution Plan
----------------------------------------------------------
Plan hash value: 821905481
--------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 90992 |
| 1 | SORT AGGREGATE | | 1 | 35 | |
| 2 | NESTED LOOPS | | 90924 | 3107K| 90992 |
| 3 | TABLE ACCESS FULL| T1 | 90924 | 2663K| 68 |
|* 4 | INDEX RANGE SCAN | IDX_T2 | 1 | 5 | 1 |
可以看到总cost=(第一个数据集的扫描成本)+((第一个数据集的行数)*1(第二个数据集成本))=69+(90924*1)=90992
而hash连接的成本等于:(第一个数据集成本+第二个数据集成本)*1.75
select count(*)
from t1,t2
where t1.object_id=t2.object_id;
COUNT(*)
----------
90924
Execution Plan
----------------------------------------------------------
Plan hash value: 3668909712
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | | 112 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 90924 | 887K| 1512K| 112 |
| 3 | INDEX FAST FULL SCAN| IDX_T1 | 90924 | 443K| | 32 |
| 4 | INDEX FAST FULL SCAN| IDX_T2 | 90925 | 443K| | 32 |
-------------------------------------------------------------------------
可以看到cost=(第一个数据集成本+第二个数据集成)*1.75=(32+32)*1.75=112
继续填t1,
SQL> select count(*) from t1;
COUNT(*)
----------
201751
SQL>select count(*)
from t1,t2
where t1.object_id=t2.object_id
COUNT(*)
----------
201751
Execution Plan
----------------------------------------------------------
Plan hash value: 625114054
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | | 178 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 199K| 1945K| 1512K| 178 |
| 3 | INDEX FAST FULL SCAN| IDX_T2 | 90925 | 443K| | 32 |
| 4 | INDEX FAST FULL SCAN| IDX_T1 | 201K| 985K| | 70 |
-------------------------------------------------------------------------
根据公式算成本为178.5,基本接近178
在做HJ连接的成本计算的时候,笔者没有考虑明白为什么要乘以1.75.