Thursday, January 10, 2008

MINUS ALL and INTERSECT ALL in Oracle Revisited

Yesterday I wrote an article explaining that MINUS ALL and INTERSECT ALL is not possible in Oracle.
However today after a bit thinking I got insight - multiset operations! Yes these should be the right thing!
Since 10g Release 1 Oracle supports MULTISET UNION/EXCEPT/INTERSECT for both ALL/DISTINCT. I have only to make some types, a bit type casting and everything should be ok!

So I've started with the simple tables I've shown yesterday.

CREATE TABLE t1 AS SELECT mod(rownum, 1000) rn
FROM dba_source WHERE rownum <=50000;
CREATE TABLE t2 AS SELECT * FROM t1;


Now I can query both tables and try to compare then. To use multiset operations I need to cast subquery as nested table. To be able to use such type casting I have to create two additional types. The first one is simple Object type containing records I'd like to compare. Second one is object table type, to which I'll cast my subqueries. So they are as follows:


CREATE TYPE t1_type
AS OBJECT ( rn NUMBER);
/
CREATE TYPE t1_tab_type
AS TABLE OF t1_type;
/


So far so good. Let's start building queries step by step. The first one just builds nested table from subquery.


SQL> select cast(multiset(
2 select rn from t1 where rownum <=3
3 ) as t1_tab_type)
4 from dual
5 /

CAST(MULTISET(SELECTRNFROMT1WHEREROWNUM<=3)AST1_TAB_TYPE)(RN)
---------------------------------------------------------------
T1_TAB_TYPE(T1_TYPE(1), T1_TYPE(2), T1_TYPE(3))

The second one already uses multiset intersect to get desired result.

SQL> select col1 multiset intersect distinct col2
2 from (
3 select cast(multiset(
4 select rn from t1 where rownum <=3
5 ) as t1_tab_type) col1,
6 cast(multiset(
7 select rn from t2 where rownum <=3
8 ) as t1_tab_type) col2
9 from dual
10 )
11 /

COL1MULTISETINTERSECTDISTINCTCOL2(RN)
---------------------------------------------------
T1_TAB_TYPE(T1_TYPE(1), T1_TYPE(2), T1_TYPE(3))

Now we just need to add one final step - cast nested table rows back to normal rows. Here us helps select from table construct:

SQL> select * from table (
2 select col1 multiset intersect distinct col2
3 from (
4 select cast(multiset(
5 select rn from t1 where rownum <=3
6 ) as t1_tab_type) col1,
7 cast(multiset(
8 select rn from t2 where rownum <=3
9 ) as t1_tab_type) col2
10 from dual
11 )
12 )
13 /

RN
----------
1
2
3

Voila! Everything works!
However then I was torn with doubt - how it will work when not used as a toy but on real volumes? I created million rows worth table with wide column and got - yeahh I got following:

ORA-22813: operand value exceeds system limits

Not pleasing.

Then I started to play with the tables defined above and got quite awful results. What is the performance of normal MINUS set operator?

SQL> select * from t1
2 minus
3 select * from t2 where rn <=998;
RN
----------
999
Elapsed: 00:00:00.10


What is the equivalent for multiset except distinct? I was a bit unpleasantly surprised again:

SQL> select * from table (
2 select col1 multiset except distinct col2
3 from (
4 select cast(multiset(
5 select rn from t1
6 ) as t1_tab_type) col1,
7 cast(multiset(
8 select rn from t2 where rn <= 998
9 ) as t1_tab_type) col2
10 from dual
11 )
12 )
13 /
RN
----------
999
Elapsed: 00:01:17.98

However for multiset except all I was simply shocked!

SQL> select * from table (
2 select col1 multiset except all col2
3 from (
4 select cast(multiset(
5 select rn from t1
6 ) as t1_tab_type) col1,
7 cast(multiset(
8 select rn from t2 where rn <= 998
9 ) as t1_tab_type) col2
10 from dual
11 )
12 )
13 /
RN
----------
999
...
999
50 rows selected.
Elapsed: 00:06:05.39

6 minutes for 2 * 50K rows!? Hmm is it implemented using bubble sort or what? :O

So normal SQL minus needed 0.10 seconds, multiset minus distinct 1 minute 18 seconds, but multiset except all more than 6 minutes.

Yeahh I got another strong argument NOT to use nested tables and their operations ;)

No comments: