Join elimination not working in Oracle with sub queries
Asked Answered
T

5

9

I am able to get join elimination to work for simple cases such as one-to-one relations, but not for slightly more complicated scenarios. Ultimately I want to try anchor modelling, but first I need to find a way around this problem. I'm using Oracle 12c Enterprise Edition Release 12.1.0.2.0.

DDL for my test case:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

Some example data:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

The 5NF view

This first view does not compile - it fails with ORA-01799: a column may not be outer-joined to a subquery. Unfortunately, this is how most of the historized views are defined when I'm looking at the online examples of anchor modelling...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

Below is my attempt at fixing it. When using this view with a simple select of product_id, Oracle manages to eliminate product_color but not product_price.

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

The only solution I have found is to use scalar sub queries instead, like this:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.

TL;DR How can I rewrite the view product_5nf so that Oracle sucessfully eliminates both of the dependent tables?

Tanya answered 8/11, 2016 at 15:4 Comment(3)
Are you sure the performance on the scalar subquery approach is really bad? In Oracle 12.1, the optimizer can and often will unnest scalar subqueries. So, you might get what you're looking for: the CBO could omit them for queries that don't ask for them and unnest them for reasonable performance when it needs to. You might get some mileage trying variations of scalar subqueries to see if you can't coax Oracle to unnest it for you.Eyecatching
@MatthewMcPeak, I did not think of that so I only tried one way to write the subquery. Definately worth investigating! I did compare execution plans and timings between scalar qry vs joins and the difference is 1/3 of the execution time in a sligthly larger test case I built involving a few hundred k records.Tanya
I just confirmed it's possible. I'll post another answer.Eyecatching
E
4

I think you have two problems going on here.

First, join elimination only works for certain, specific situations (PK-PK, PK-FK, etc). It is not a general thing where you can LEFT JOIN to any row set that will return a single row for each join key value and have Oracle eliminate the join.

Second, even if Oracle were advanced enough to do join elimination on ANY LEFT JOIN where it knew it would get only one row per join key value, Oracle does not yet support join eliminations on LEFT JOINS that are based on a composite key (Oracle support document 887553.1 says this is coming in R12.2).

One workaround you could consider is materializing a view with the last row for each product_id. Then LEFT JOIN to the materialized view. Like this:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------
Eyecatching answered 8/11, 2016 at 19:43 Comment(2)
Ok perfect! This explains "why" it is not working for me and also gives me an idea of what I have to do to solve it (until 12.2).Tanya
Thanks for the information about join elimination on composite keys, possibly coming in 12.2. We have definitely been waiting for this feature!Ultramicrometer
K
2

I can't make the price join be eliminated, but if you do the following, it can at least reduce the access to a single index for the price check:

CREATE OR REPLACE view product_5nf as
select p.product_id
      ,pc.color
      ,pp.price 
 from product p
 left join product_color pc ON p.product_id = pc.product_id
 left join (select pp1.product_id, pp1.price 
              from (SELECT product_id,
                           price,
                           from_date,
                           max(from_date) OVER (PARTITION BY product_id) max_from_date
                    FROM   product_price) pp1
             where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;
Konyn answered 8/11, 2016 at 15:38 Comment(8)
Thanks for effort! I've been down that road already, but the crucial part is the join elimination. Your answer does not answer my question, but I think it adds value to the question anyhow. Thanks!Tanya
Yeah; it's an interesting problem. I thought maybe that because the product_price table could have more than one row per product_id (as per the primary key), that was causing Oracle not to do the elimination (even though we know the subquery would only return one row per product_id). However, switching the primary key to just product_key didn't help.Konyn
A bit more of a play around suggests that if I create the product_price table with a pk of product id, and just join directly to the table, it works fine. So, I think it's the fact that Oracle doesn't know for sure that your subquery is only going to return one row per product_id that's causing the problem. I tried creating the subquery as a separate view and adding a primary key on just the product_id column, but that didn't work either.Konyn
I had a brief discussion over on Twitter with Jonathan Lewis, and he referred me to this optimiser blog entry. It's probably the first restriction mentioned at the end of that post which is causing the optimiser not to be able to eliminate the join. Where is the view called from? If it's from a procedure, perhaps you should use an IF statement to run a different query in the event you're not querying the price columns?Konyn
I wonder if redundantly rewriting the subquery to "select pp1.product_id, max(pp1.price) price" etc would encourage the optimiser to know that the result was unique on product_id.Ind
@Boneist, the case above is just a made-up example that demonstrates my problem with join elimination. Ultimately I want to try Anchor Modelling for our data warehouse. And the 5NF layer is implemented as views over 6NF tables. Parts of the secret sauce to performance here seems to be the join elimination. But I'm not done with the reading yet :)Tanya
@Ronnis, slightly off topic, but I just want to mention to anyone else reading up on table elimination that the 5NF layer is just for convenience in Anchor Modeling. Hand written queries against the 6NF tables will, naturally, equal the performance of the 5NF layer with fully supported table elimination, while being more cumbersome to write. So it boils down to the value of having that convenience in databases that do not yet fully support table elimination, like Oracle, when deciding if you should use the Anchor Modeling technique.Ultramicrometer
@LarsRönnbäck, of course! The table elimination (or lack thereof) has nothing to do with AM, and everything to do with Oracle. The only reason I'm mentioning anchor modelling is because I wanted to attract the attention of people who might have used Oracle AND anchor modelling. Had no idea that the inventor would show up :) Do you know of any oracle implementations that have shared their experiences on line?Tanya
E
1

Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.

The cost-based optimizer in Oracle 12.1 can perform a query transformation to unnest scalar subqueries. So, the performance could be just as good as the LEFT JOIN you are after in your question.

The trick is you have to jigger it a bit.

First, make sure the scalar subquery returns max() with no group by, so CBO knows there is no chance of getting more than one row. (It won't unnest otherwise).

Second, you need to combine all the fields from product_price into a single scalar subquery, or else the CBO will unnest and join in product_price multiple times.

Here is a test case for Oracle 12.1 that illustrates this working.

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;


create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;

insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;

--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));

commit;

begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end; 

commit;

alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );

create or replace view product_5nf as
   select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price 
from 
(    select p.product_id
         ,pc.color
         ,( SELECT max(composite_column)  FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
     from product p
     left join product_color pc on pc.product_id = p.product_id )  d
;

select product_id from product_5nf;

----------------------------------------------
| Id  | Operation         | Name    | E-Rows |
----------------------------------------------
|   0 | SELECT STATEMENT  |         |        |
|   1 |  TABLE ACCESS FULL| PRODUCT |    100K|
----------------------------------------------

select * from product_5nf;

SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

--------------------------------------------------------------------------------------
| Id  | Operation                | Name          | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |        |       |       |          |
|*  1 |  HASH JOIN RIGHT OUTER   |               |    100K|  8387K|  3159K| 8835K (0)|
|   2 |   VIEW                   | VW_SSQ_2      |      2 |       |       |          |
|   3 |    HASH GROUP BY         |               |      2 |    13M|  2332K|   12M (0)|
|   4 |     VIEW                 | VM_NWVW_3     |      2 |       |       |          |
|*  5 |      FILTER              |               |        |       |       |          |
|   6 |       HASH GROUP BY      |               |      2 |    23M|  5055K|   20M (0)|
|*  7 |        HASH JOIN         |               |    480K|    12M|  4262K|   17M (0)|
|   8 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|   9 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|* 10 |   HASH JOIN OUTER        |               |    100K|  5918K|  3056K| 5847K (0)|
|  11 |    TABLE ACCESS FULL     | PRODUCT       |    100K|       |       |          |
|  12 |    TABLE ACCESS FULL     | PRODUCT_COLOR |    100K|       |       |          |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_2"="P"."PRODUCT_ID")
   5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
   7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
  10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")
Eyecatching answered 9/11, 2016 at 13:40 Comment(8)
FYI... I'm not suggesting that unnesting is definitely faster. Keeping it nested ( essentially a nested loop left join ) may be faster. But the point is that with technique, you'll have the control to do it either way.Eyecatching
I'm trying it out now! Looks promising! I haven't yet understood the substr() magic, but I guess it is not really needed if I only need price? The from_date will never be included in the final view.Tanya
Yes, you don't need the substr() stuff if you only need one column. The substr() was to split out the multiple columns that we stuffed together so that we could get them all in a single scalar subquery.Eyecatching
Can you update with results? Which one was faster and by how much? I can try it later today myself, but maybe not everyone is as interested.Eyecatching
The performance difference is factor 2 between the scalar technique vs join. I couldn't post code in the comments so I edited your answer and added it. Apologies if this is not how it should work :)Tanya
Which is faster? What are the plans? I'll try it later myself if I have time.Eyecatching
The scalar one takes twice as long. I just now also saw that it uses NL+index whereas the "normal" join one uses hash joins. This could also explain the difference. I saw yours was using hash join but I can't seem to get that plan here.Tanya
Let us continue this discussion in chat.Eyecatching
T
0

Ok, I'm answering my own question. The information in this answer is valid for Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production, but likely not for later versions. Don't vote for this answer, as it does not answer the question.

Due to a specific limitation in the current version (as noted by Mathew McPeak) it is simply not possible to get Oracle to completely eliminate the unnecessary joins in the underlaying 5NF view. The limitation is that join elimination is not possible on left joins that are based on a composite key.

Any attempt at working around this limitation seems to introduce either duplication or update anomalies. The accepted answer demonstrates how to overcome this limitation in the optimiser by using materialized views and thereby duplicating the data. This answer shows how to solve the problem with less duplication but with update anomalies.

This workaround is based on the fact that you can use nullable columns in unique indexes. We will put null for all historical versions and the actual product_id for the most recent version referencing the product table with a foreign key.

alter table product_price add(
   latest_id number
  ,constraint product_price_uk  unique(latest_id)
  ,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
  ,constraint product_price_chk check(latest_id = product_id)
);

-- One-time update of existing data
update product_price a
   set a.latest_id = a.product_id
 where from_date = (select max(from_date) 
                      from product_price b 
                     where a.product_id = b.product_id);   

PRODUCT_ID FROM_DATE       PRICE  LATEST_ID
---------- ---------- ---------- ----------
         1 2016-01-01         10       null
         1 2016-02-01          8       null
         1 2016-05-01          5          1
         2 2016-02-01          5          2
         4 2016-01-01         10          4

-- New view definition             
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price
     from product p
     left join product_color pc on(pc.product_id = p.product_id)
     left join product_price pp on(pp.latest_id  = p.product_id);

Of course, now latest_id has to be manually maintained... whenever a new record is inserted, the old record must first be updated with null.

There are two benefits with this approach. First, Oracle is able to completely remove unnecessary joins. And second, the joins are not executed as scalar subqueries.

SQL> select count(*) from product_5nf;

---------------------------------------
| Id  | Operation        | Name       |
---------------------------------------
|   0 | SELECT STATEMENT |            |
|   1 |  SORT AGGREGATE  |            |
|   2 |   INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------

Oracle recognizes that the count can be resolved without even touching the base table. And no unnecessary joins to be seen...

SQL> select product_id, price from product_5nf;

---------------------------------------------------------
| Id  | Operation                    | Name             |
---------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |
|*  1 |  HASH JOIN OUTER             |                  |
|   2 |   INDEX FULL SCAN            | PRODUCT_PK       |
|   3 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE    |
|*  4 |    INDEX FULL SCAN           | PRODUCT_PRICE_UK |
---------------------------------------------------------

Oracle recognizes that we have to join to product_price in order to get the price column. And product_color is nowhere to be seen...

SQL> select * from product_5nf;

----------------------------------------------------------
| Id  | Operation                     | Name             |
----------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |
|*  1 |  HASH JOIN OUTER              |                  |
|   2 |   NESTED LOOPS OUTER          |                  |
|   3 |    INDEX FULL SCAN            | PRODUCT_PK       |
|   4 |    TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR    |
|*  5 |     INDEX UNIQUE SCAN         | PRODUCT_COLOR_PK |
|   6 |   TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE    |
|*  7 |    INDEX FULL SCAN            | PRODUCT_PRICE_UK |
----------------------------------------------------------

Here Oracle has to materialize all the joins, since all columns are referenced.

Tanya answered 15/11, 2016 at 10:49 Comment(1)
A good illustration of how many query issues can be solved by better data modeling. If you want the database to give you the most recent information quickly and often, tag the most recent row. The materialized view option was a mostly transparent way of doing that. If you can modify the processes that maintain the data, then your approach is even better: put the data into the database that you want to get out of the database.Eyecatching
T
0

[I don't know if an ANTI-JOIN counts as a subquery in Oracle], but the not exists trick is often a way to avoid an aggregating subquery:

CREATE VIEW product_5nfa as
   SELECT p.product_id
         ,pc.color
         ,pp.price
     FROM product p
     LEFT JOIN product_color pc
        ON pc.product_id = p.product_id
     LEFT join product_price pp
        ON pp.product_id = p.product_id
        AND NOT EXISTS ( SELECT * FROM product_price pp2
            WHERE pp2.product_id = pp.product_id
            AND pp2.from_date  > pp.from_date
            )   
     ;

Comment from OP: The view is created, but Oracle is still unable to remove the join. Here is the execution plan.

select count(*) from product_5nfa;

-------------------------------------------------
| Id  | Operation            | Name             |
-------------------------------------------------
|   0 | SELECT STATEMENT     |                  |
|   1 |  SORT AGGREGATE      |                  |
|   2 |   NESTED LOOPS OUTER |                  |
|   3 |    INDEX FULL SCAN   | PRODUCT_PK       |
|   4 |    VIEW              |                  |
|   5 |     NESTED LOOPS ANTI|                  |
|*  6 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|*  7 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------
Terwilliger answered 15/11, 2016 at 11:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.