22 March, 2011

OuterJoin with Filter Predicate

I recently came across a query where an outer join was being used. Such an OuterJoin allows us to create a "dummy" row for the columns from a table where the join fails so that we can still present the values from the other table in the join.
However, the query was subsequently modified by the developer to add a filter predicate on the table that the OuterJoin was defined on.
Remember that an OuterJoin returns columns with NULL values for the table that is Outer Joined.
If you add a simple filter for this table, the OuterJoin is negated -- because the rows from the OuterJoin, returning NULLs, would fail the filter on the table !


In this example, based on the well-known EMP and DEPT tables, I first show an OuterJoin where "dummy" emp rows are created (and presented with E.DEPTNO as 0). EMP has no rows for DEPTNOs 40 and 50, so these are returned as 0s.
Then, a filter on a column that appears neither in the WHERE clause nor in the SELECT clause negates the Outer Join !


SQL> -- create the tables
SQL> drop table emp;

Table dropped.

SQL> drop table dept;

Table dropped.

SQL> create table dept
2 as select * from scott.dept;

Table created.

SQL> create table emp
2 as select * from scott.emp;

Table created.

SQL>
SQL> -- list the DEPT values
SQL> select deptno from dept;

DEPTNO
----------
10
20
30
40

4 rows selected.

SQL> select distinct(deptno) from emp;

DEPTNO
----------
30
20
10

3 rows selected.

SQL>
SQL> -- create a dept
SQL> insert into dept values (50,'HQ','SINGAPORE');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> -- OuterJoin query
SQL> -- creates "dummy" rows from EMP
SQL> -- for DEPTIDs that exist in DEPT but do not exist in EMP
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 order by 1
5 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
10 NEW YORK ACCOUNTING 10 7839 KING
10 NEW YORK ACCOUNTING 10 7782 CLARK
20 DALLAS RESEARCH 20 7902 FORD
20 DALLAS RESEARCH 20 7876 ADAMS
20 DALLAS RESEARCH 20 7566 JONES
20 DALLAS RESEARCH 20 7369 SMITH
20 DALLAS RESEARCH 20 7788 SCOTT
30 CHICAGO SALES 30 7900 JAMES
30 CHICAGO SALES 30 7844 TURNER
30 CHICAGO SALES 30 7698 BLAKE
30 CHICAGO SALES 30 7654 MARTIN
30 CHICAGO SALES 30 7521 WARD
30 CHICAGO SALES 30 7499 ALLEN
40 BOSTON OPERATIONS 0
50 SINGAPORE HQ 0

16 rows selected.

SQL>
SQL> -- OuterJoin query with filter
SQL> -- what happens if the query is modified to add a filter
SQL> -- even if the filter is not on the join column ?
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 -- an additional filter for job='CLERK' has been added
5 -- this column is not in the Join and is not in the SELECT
6 and e.job = 'CLERK'
7 order by 1
8 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
20 DALLAS RESEARCH 20 7876 ADAMS
20 DALLAS RESEARCH 20 7369 SMITH
30 CHICAGO SALES 30 7900 JAMES

4 rows selected.

SQL>
SQL> -- Which are the CLERKs ?
SQL> select e.deptno, e.empno, e.ename
2 from emp e
3 where e.job = 'CLERK'
4 order by 1
5 /

DEPTNO EMPNO ENAME
---------- ---------- ----------
10 7934 MILLER
20 7369 SMITH
20 7876 ADAMS
30 7900 JAMES

4 rows selected.

SQL>


Note how the addition of a filter "e.job = 'CLERK'" changed the output of the EMP - DEPT join query from 16 rows to 4 rows. Even the two "dummy" rows for DEPTNOs 40 and 50 are now excluded.

Thus, if you need to use an OuterJoin, check what filter predicates you are specifying on the table that is OuterJoined !


UPDATE : As Rob and "orcl" explain, applying an "OuterJoin" on the additional filter column will allow retrieval of DEPTNOs 40 and 50 from DEPT, although this filter will exclude all other employees from EMP !


SQL> -- OuterJoin query with filter
SQL> -- specify an OuterJoin on the filter column as well !
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 -- an additional filter for job='CLERK' has been added
5 -- this column is not in the Join and is not in the SELECT
6 -- and e.job = 'CLERK'
7 and e.job(+) = 'CLERK'
8 order by 1
9 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
20 DALLAS RESEARCH 20 7369 SMITH
20 DALLAS RESEARCH 20 7876 ADAMS
30 CHICAGO SALES 30 7900 JAMES
40 BOSTON OPERATIONS 0
50 SINGAPORE HQ 0

6 rows selected.

SQL>


So, while "e.deptno(+) = d.deptno" is really an OuterJoin, we are simulating an OuterJoin in "e.job(+) = 'CLERK'" to ensure that we create dummy rows in EMP. Strange ?! We don't get dummy rows for jobs 'SALESMAN', 'MANAGER' etc which do not "join" to 'CLERK' although we do get dummy rows where the join to "d.deptno" would not be successful.


.
.
.

4 comments:

Rob van Wijk said...

Hi Hemant,

With the risk of stating the obvious, it looks like you want to do "and e.job (+) = 'CLERK'"

Regards,
Rob.

Hemant K Chitale said...

Rob,
This is a case where the initial query is correct, with an Outer Join and no filter.
Subsequently the query is modified, say per refined requirements, and the filter on CLERK is added. If the new requirement is to specifically list only CLERKs, the OuterJoin is defeated.

Hemant

orcl said...

What Rob means is that by changing to e.job(+)='CLERK' will make the two deptno (40,50) show up which makes outerjoin works again.
Using ANSI standard query like this will work too.
select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
from emp e right outer join dept d
on e.deptno=d.deptno
and e.job='CLERK'
order by 1;

Hemant K Chitale said...

Rob, orcl,

Yes, thank you for the updated query. I was too focused on the filter that I had seen in the original case that I didn't test it with "e.job(+) = 'CLERK'.

You are correct, changing the "e.job = 'CLERK' to "e.job(+) = 'CLERK'" will also work.

Hemant