因为in对null值敏感,而oracle 11g之前没有null-aware anti join的算法,此时not in对应的执行计划往往是filter,而filter如果没有合适的驱动表,执行成本往往非常高。 下面我们添加一个not null的约束给t1和t2表分别添加not null的约束
SQL> alter table t1 modify object_id not null;
SQL> alter table t2 modify object_id not null; 如果在对应的列添加not nul约束,再来查看not in执行计划
此时oracle采用了hash join right anti的连接方式,
SQL> select * from t1 where t1.object_id not in (select object_id from t2); no rows selected Execution Plan
----------------------------------------------------------
Plan hash value: 629543484 -------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 2910 | | 876 (1)| 00:00:11 |
|* 1 | HASH JOIN RIGHT ANTI| | 30 | 2910 | 1152K| 876 (1)| 00:00:11 |
| 2 | TABLE ACCESS FULL | T2 | 73616 | 287K| | 225 (1)| 00:00:03 |
| 3 | TABLE ACCESS FULL | T1 | 73615 | 6685K| | 226 (2)| 00:00:03 |
------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
--------------------------------------------------- 1 - access("T1"."OBJECT_ID"="OBJECT_ID") Statistics
----------------------------------------------------------
285 recursive calls
0 db block gets
2082 consistent gets
0 physical reads
0 redo size
1139 bytes sent via SQL*Net to client
481 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
0 rows processed oracle将关联操作改为了hash join right anti,同样我们上面基本所有的测试用例中都出现了hash join right anti和hash join right semi,这个是oracle对于not in和not exists、in和exists的一种子查询的优化,当然我们必然能够unnest展开子查询为表连接,这个默认就是开启的。 而在oracle 11g中,由于有null-aware anti join的算法,即使没有not nul约束,此时hash join right也能够计算比较null值,所以此时执行计划转变为HASH JOIN RIGHT ANTI NA
[oracle@ora11g ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on 星期四 5月 8 19:26:42 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> create table t1 as select * from dba_objects;
Table created.
SQL> create table t2 as select * from dba_objects;
Table created.
SQL> select * from t1 where t1.object_id not in (select object_id from t2); no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2739594415 ----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 95457 | 20M| | 1765 (1)| 00:00:22 |
|* 1 | HASH JOIN RIGHT ANTI NA| | 95457 | 20M| 1952K| 1765 (1)| 00:00:22 |
| 2 | TABLE ACCESS FULL | T2 | 79725 | 1012K| | 339 (1)| 00:00:05 |
| 3 | TABLE ACCESS FULL | T1 | 95457 | 18M| | 340 (1)| 00:00:05 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------- 1 - access("T1"."OBJECT_ID"="OBJECT_ID") Note
-----
- dynamic sampling used for this statement (level=2) Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
1382 consistent gets
1242 physical reads
0 redo size
1351 bytes sent via SQL*Net to client
509 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
0 rows processed 那么对于not in和not exists而言,在oracle 11g下由于新的算法的引入,执行计划都是hash join right anti na,执行计划都是一样的。 而在oracle 10g而言,如果没有not null约束,not in还是走的filter循环,这个一般对于没有合适的驱动表或者结果集而言执行成本相比是比较大的。而not exists由于对于null值不敏感,还是可以采用高效的hash join right anti算法去联合,所以这种情况下,也就是我们经常所说的not exists可能比not in更加高效一点。 而如果你使用的数据库是oracle 10g,而且又不能指定not null约束,那么可能你需要改写为not exists了,或者改写成下列的sql来进行优化调整:
SQL> select * from t1 minus
2 select * from t1 where t1.object_id in (select object_id from t2); no rows selected Execution Plan
----------------------------------------------------------
Plan hash value: 3848269147 -------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 73615 | 13M| | 4317 (59)| 00:00:52 |
| 1 | MINUS | | | | | | |
| 2 | SORT UNIQUE | | 73615 | 6685K| 17M| 1807 (1)| 00:00:22 |
| 3 | TABLE ACCESS FULL | T1 | 73615 | 6685K| | 226 (2)| 00:00:03 |
| 4 | SORT UNIQUE | | 73600 | 6971K| 14M| 2510 (1)| 00:00:31 |
|* 5 | HASH JOIN | | 73600 | 6971K| 1152K| 876 (1)| 00:00:11 |
| 6 | TABLE ACCESS FULL| T2 | 73616 | 287K| | 225 (1)| 00:00:03 |
| 7 | TABLE ACCESS FULL| T1 | 73615 | 6685K| | 226 (2)| 00:00:03 |
------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
--------------------------------------------------- 5 - access("T1"."OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3048 consistent gets
0 physical reads
0 redo size
1139 bytes sent via SQL*Net to client
481 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
0 rows processed
|