Creating a custom, multi-argument Oracle analytic function
Asked Answered
J

4

6

Background

I know that in Oracle it's possible to create custom aggregate functions that process a collection of values and return a single result. Edit: I've even read the friendly manual at docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm !

I also know that Oracle provides built-in analytic functions like DENSE_RANK and RATIO_TO_REPORT, which provide values for each input, relative to a collection/window of values that input lies within.

Problem

What I want to know is if there's a way to create my own analytic function, presumably in a similar manner as I can create my own aggregation function, and in particular create one with additional arguments in my custom analytic function.

Subtle terminological caveat

When I refer to an "analytic function", please read it as a function which, in addition to accepting windowing parameters via the PARTITION keyword, also can return different values within a given window. (If anyone has a better term for this, please let me know! Pure analytic function? DENSE_RANK-class analytic function? Non-aggregate analytic function?)

The Oracle documentation notes that an aggregate function can be used as an analytic (windowed) function. Unfortunately, this only means that the PARTITION keyword for specifying windowing in analytic functions can be applied to aggregate functions as well. It doesn't promote the aggregate function to my coveted status of being able to return different values within a fixed window.

Aggregate used as analytic:

SELECT SUM(income) OVER (PARTITION BY first_initial) AS total FROM data;

will have as many records as data, but it will only have as many distinct totals as there are first initials.

Analytic used as analytic:

SELECT RATIO_TO_REPORT(income) OVER (PARTITION BY first_initial) AS ratio FROM data;

will have as many records as data, AND, even within a given first_initial partition, those ratios may all be distinct.

Context

I have been provided with call-only access to a PL/SQL procedure which accepts a numeric collection as an IN OUT parameter, and which has a few other IN configuration parameters. The procedure modifies the collection's values (think of it like "The University's Proprietary Sanctioned and Required Grade Curving Procedure") in a manner influenced by the configuration parameters.

Currently, the process to use the procedure is to hardcode a cursor loop that detects a change from one data partition to another, and then within each partition fetches data into a collection which is then passed to the procedure, altered, and eventually dumped back out into a separate table. I planned to improve this by making a PIPELINED PARALLEL_ENABLE table function that encapsulates some of the logic, but I'd much prefer to enable queries like the following:

SELECT G.Course_ID
     , G.Student_ID
     , G.Raw_Grade
     , analytic_wrapper(G.raw_grade, P.course_config_data)
                  OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade
     , P.course_config_data
  FROM      grades   G
  LEFT JOIN policies P
  ON G.Course_ID = P.Course_ID;

That requires being able to create a custom analytic function, though, and due to the way the procedure requires different inputs on different partitions (e.g. the Course_ID-specific P.course_config_data above) it also has to accept not only the data-to-be-aggregated argument, but also additional inputs.

Is this possible, and if so where could I find documentation? My Google-fu has failed me.

An Additional Wrinkle

The PL/SQL procedure I've been provided with is (effectively) non-deterministic, and its output has statistical properties which must be preserved. For example, if A={A[0], A[1], A[3]} are the raw grades for one particular class, and B=f(A) is the result of invoking the procedure on A at 1:00 while C=f(A) is the result of invoking the procedure on A at 1:15, then B={B[0],B[1],B[2]} and C={C[0],C[1],C[2]} are both acceptable outputs to use, but a mixture of the elements like {C[0],B[1],C[2]} is not acceptable.

The upshot of this is that the procedure must be called exactly once on each partition. (Well, technically, it can be wastefully called as many times as one wants, but all results for a partition must come from the same invocation).

Supposed, for example, that the procedure I've been supplied with operates as follows: It accepts a collection of grades as an IN OUT parameter, and then sets one of those grades, chosen at random, to 100. All the other grades are set to zero. Running this at 1pm might result in Alice having the only passing grade, while running it at 1:01pm might result in Bob having the only passing grade. Regardless, it should be the case that exactly one student per class passes, no more and no less.

Joshi answered 25/8, 2015 at 18:11 Comment(0)
Z
2

This version does not suffer from the caveats of my previous answer, although it is going to be slower and a little harder to use. Most of the slowness from the loop in ODCIAggregateDelete -- you may be able to find an improvement there that doesn't require looping through the whole collection.

Anyway, this version makes a custom analytic function that mimics Oracle's native COLLECT function. So, instead of trying to create a custom analytic function that computes the actual value we want, it merely computes the set of rows in the window.

Then, for each row, we pass in the row data and the results of our custom "COLLECT" analytic to a regular function that computes the value we want.

Here's the code. (NOTE: your original question also asked about multiple parameters. Easy -- just put all the fields you want into matt_ratio_to_report_rec.) (Also, sorry about the object names -- I prefix my name on everything is other developers know who to ask if the object is causing them problems.)

-- This is the input data to the analytic function
--DROP TYPE matt_ratio_to_report_rec;
CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT
  ( value   NUMBER );

-- This is a collection of input data  
--DROP TYPE matt_ratio_to_report_tab;
CREATE OR REPLACE TYPE matt_ratio_to_report_tab AS TABLE OF matt_ratio_to_report_rec;


-- This object type implements a custom analytic that acts as an analytic version of Oracle's COLLECT function
--DROP TYPE matt_ratio_to_report_col_impl;
CREATE OR REPLACE TYPE matt_ratio_to_report_col_impl AS OBJECT (
  analytics_window    matt_ratio_to_report_tab,
  CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT,  
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead 
-- of creating the new aggregation context from scratch
  STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context  
  MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
  MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves) 
  MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER,
  -- ODCIAggregateDelete
  MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER  
);

CREATE OR REPLACE TYPE BODY matt_ratio_to_report_col_impl IS

CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT IS
BEGIN
  SELF.analytics_window := new matt_ratio_to_report_tab();
  RETURN;
END;


STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
  DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize()');
  sctx := matt_ratio_to_report_col_impl ();
  RETURN ODCIConst.Success;
END;


MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.analytics_window.COUNT || ')');

  -- Add record to collection
  self.analytics_window.extend();
  self.analytics_window(self.analytics_window.COUNT) :=  value;
  RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.analytics_window.COUNT || ' - flags: ' || flags || ')');
  IF flags = 1 THEN
    returnValue := self.analytics_window;
  END IF;
  RETURN ODCIConst.Success;
EXCEPTION
  WHEN others THEN 
    DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
    RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
--   DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')');
  -- TODO: Add all elements from ctx2 window to self window
  RETURN ODCIConst.Success;
END;

-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS
  l_ctr NUMBER;
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.analytics_window.COUNT || ' - ' || value.value || ')');
   l_ctr := self.analytics_window.FIRST;
   <<window_loop>>
   WHILE l_ctr IS NOT NULL LOOP
     IF ( self.analytics_window(l_ctr).value = value.value ) THEN
       self.analytics_window.DELETE(l_ctr);
       DBMS_OUTPUT.PUT_LINE('... deleted slot ' || l_ctr);
       EXIT window_loop;
     END IF;
     l_ctr := self.analytics_window.NEXT(l_ctr);
   END LOOP;
  RETURN ODCIConst.Success;

END;  

END;
/

-- This function is the analytic version of Oracle's COLLECT function
--DROP FUNCTION matt_ratio_to_report;
CREATE OR REPLACE FUNCTION matt_ratio_to_report_col ( input matt_ratio_to_report_rec) RETURN matt_ratio_to_report_tab
PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_col_impl;
/


-- This the actual function we want
CREATE OR REPLACE FUNCTION matt_ratio_to_report ( p_row_value NUMBER, p_report_window matt_ratio_to_report_tab ) RETURN NUMBER IS
  l_report_window_sum NUMBER := 0;
  l_counter NUMBER := NULL;
BEGIN
  IF p_row_value IS NULL or p_report_window IS NULL THEN
    RETURN NULL;
  END IF;

  -- Compute window sum
  l_counter := p_report_window.FIRST;
  WHILE l_counter IS NOT NULL LOOP
    l_report_window_sum := l_report_window_sum + NVL(p_report_window(l_counter).value,0);
    l_counter := p_report_window.NEXT(l_counter);
  END LOOP;

  RETURN p_row_value / NULLIF(l_report_window_sum,0);
END matt_ratio_to_report;  



-- Create some test data
--DROP TABLE matt_test_data;
CREATE TABLE matt_test_data ( x, group# ) PARALLEL 4
AS SELECT rownum, ceil(rownum / 10) group# FROM DUAL CONNECT BY ROWNUM <= 50000;


-- TESTER 9/30
with test as (
SELECT d.x,
       CEIL (d.x / 10) group#,
       ratio_to_report (d.x) OVER (PARTITION BY d.group#) oracle_rr,
       matt_ratio_to_report (
         d.x,
         matt_ratio_to_report_col (matt_ratio_to_report_rec (d.x)) OVER (PARTITION BY d.group#)) custom_rr
FROM   matt_test_data d )
SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test 
--where oracle_rr != custom_rr
ORDER BY test_results nulls last, x; 
Zachary answered 30/9, 2015 at 17:37 Comment(4)
If I understand your description correctly, this would involve the COLLECTed results being processed multiple times (once for each final value), correct? My concern there isn't so much speed (though that's a thing), as a nuance of the procedure I have to pass data to, which I hadn't originally thought relevant to share. See the original post, where I've added an "An Additional Wrinkle" section. Also, I rather like the matt_ names: they make it clear what's built-in vs what you're defining!Joshi
In your example, you'll just be replacing analytic_wrapper(G.raw_grade, P.course_config_data) OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade with custom_non_analytic_wrapper(G.raw_grade, P.course_config_data, custom_analytic_collect(custom_rec_type(g.raw_grade, P.course_config_data)) over ( partition by g.course_id)) as adjusted_grade. I am not following how your additional wrinkle causes a problem for that.Zachary
I'm not sure it causes a problem, but I think it does. First, see updated example at the end of my post. Now, assuming that the procedure would be invoked in your -- Compute window sum section, I think it would be called once for each student, right? And if the call for Alice happened at 1:00 and the call for Bob happened at 1:01, then both students would end up passing even though there should never be more than one passing student. I may be misidentifying where the windowed data would be transformed, though.Joshi
None of our custom code is issuing any SELECT statements. Oracle is gathering all the data for us in a single SELECT. Therefore, the results will be read consistent. In other words, all of the data would be consistent as of the point in time when the query was started. This is true regardless of data that may be committed while the query is being executed.Zachary
H
1

The only way I found to create a custom aggregate with multiple parameters is to create a new TYPE with the desired number of elements and then pass an instance of that type to the aggregate:

First define the structure to hold all the "parameters" you need:

create or replace type wrapper_type as object
(
   raw_grade integer,
   config_data varchar
);
/

Then create your aggregate:

CREATE OR REPLACE TYPE analytic_wrapper AS OBJECT
(
  .. variables you might need

  STATIC FUNCTION ODCIAggregateInitialize(actx IN OUT wrapper_type) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateIterate(self  IN OUT wrapper_type, val IN wrapper_type) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateTerminate(self IN wrapper_type, returnValue OUT number, flags IN NUMBER) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateMerge(self  IN OUT wrapper_type, ctx2 IN  wrapper_type) RETURN NUMBER
);
/

Then you need to implement the actual aggregate logic in the type body. Once that is done, you can use something like this:

select analytic_wrapper(wrapper_type(G.raw_grade, P.course_config_data))
from ... 

The above was written more or less from memory, so I'm pretty sure it's full of syntax errors, but it should get you started.

More details and examples are in the manual: http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/aggr_functions.htm#ADDCI026

The manual states that such an aggregate can be used as an analytical function:

When a user-defined aggregate is used as an analytic function, the aggregate is calculated for each row's corresponding window

Harragan answered 25/8, 2015 at 18:35 Comment(9)
Here's a full example of custom aggregate with multiple parametersCapone
Is there a way to repost my question into two parts? Your solution to the multiple-argument issue takes care of my sub-problem, and I'd like to recognize that, but the main problem (if it's even possible to write a window function) remains.Joshi
@Bryant: the manual seems to suggest that it's possible to use such an aggregate as a window (aka analytical) function: docs.oracle.com/cd/E11882_01/appdev.112/e10765/…Harragan
@a_horse_with_no_name: Yes, I got excited by that section when I saw it initially, but there's a subtle distinction to be made. There needs to be some special name for "true analytic" functions like DENSE_RANK and RATIO_TO_REPORT, to distinguish them from "aggregate functions (that can be used as an analytic function)" like SUM or MAX. Even though SUM can be used as an analytic function, for a given window it always returns the same aggregate result. RATIO_TO_REPORT, by contrast, returns different results even within a particular windowJoshi
@Joshi can you give an example of why you think SUM isn't a true analytic function, compared with RATIO_TO_REPORT. I'm unclear what you mean in terms of the windowing. As I'm sure you know, there are windowing clauses that can be used with SUM or RATIO_TO_REPORT, but perhaps there's a distinction that I'm missing. (sorry to hijack your thread a_horse_with_no_name :-)Capone
@Capone Here's a longer-than-comment-length SQLFiddle that demonstrates the distinction I care about. SUM is providing one distinct value per window-of-N-records, while RATIO_TO_REPORT is providing (up to) N distinct values for the same window.Joshi
@Capone It might make more sense to think about it backward: aggregate-functions-insufficient-for-my-needs MAY be used without a window specification, and DO have meaning as a pure aggregate. You can talk about "What is THE sum of these twenty values taken together". On the other hand, analytic-functions-I-hope-to-emulate (which for now I call "true analytic") MUST be used with a window specification, and DO NOT have meaning as pure aggregates. You cannot meaningfully say "What is THE rank of these values taken together", as each value bears a different rank.Joshi
@Joshi I see, other than using an aggregate function in an analytic way (meaning using partition by order by and windowing clauses like rows preceding and following), I don't believe you can do what you want. I will say that I've been using Oracle for over 15 years, and haven't had the need to create my own "truly analytic" function, so perhaps a different approach would be easier to use and maintain (although I enjoy the challenge, and wish I had what you wanted)Capone
@Capone I do have a different approach in use now that works fine: a parallel pipelined table function (that runs in a minute, a fair bit better than the previous approach's 20 hour runtime!) The annoyance is that different variations of the function are needed for different settings, yielding a lot of copy-paste-modify coding and not enough modular reuse (decreasing maintainability.) I haven't yet found a sane way to parameterize the variations using NDS, but can see how to do it if there were custom-analytic hooks like the aggregate ones, and if I could supply multiple arguments. Oh well.Joshi
Z
1

I have the same need. I am posting an approach that seems to work (it agrees with the native-Oracle ratio_to_report function for all the cases I've tried so far).

My concern is that it relies on the "fact" that the ODCIIterate and ODCITerminate methods are always called in the same order. I don't have any reason to believe that this is always the case. I may log an SR, because I don't think I can use this version without clarification from Oracle.

Still, I am posting the code since it does represent an answer to the question.

Caveat #1 -- this code stores state in a PL/SQL package. I hate this, but I saw no alternative since ODCITerminate passes SELF as IN only, not IN OUT. Aside from being ugly, this means you cannot have multiple usages of the custom analytic function in the same query (since their states will get co-mingled). I'm sure one could code around this limitation (e.g., give each ODCI context a unique value and keep separate states for each unique context).

Caveat #2 -- my test case uses PARALLEL query. I can see from the explain plan that it is running in parallel. However, it does not seem to be instantiating and merging multiple contexts, which I really wanted to test because, if anything breaks this approach, it would be that.

Here's the code.

CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT
  ( key     VARCHAR2(80),
    value   NUMBER );

CREATE OR REPLACE PACKAGE matt_ratio_to_report_state AS
  TYPE values_tab_t IS TABLE OF matt_ratio_to_report_rec INDEX BY BINARY_INTEGER;
  TYPE index_tab_t IS TABLE OF NUMBER INDEX BY VARCHAR2(80);
  G_VALUES_TAB values_tab_t;
  G_INDEX_TAB index_tab_t;
  G_ITERATOR_POSITION NUMBER;
  G_TERMINATOR_POSITION NUMBER;
END matt_ratio_to_report_state;
/


CREATE OR REPLACE TYPE matt_ratio_to_report_impl AS OBJECT
(
  window_sum      NUMBER,
  CONSTRUCTOR FUNCTION matt_ratio_to_report_impl(SELF IN OUT NOCOPY matt_ratio_to_report_impl ) RETURN SELF AS RESULT,  
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead 
-- of creating the new aggregation context from scratch
  STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context  
  MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
  MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_impl, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves) 
  MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_impl, ctx2 IN matt_ratio_to_report_impl) RETURN NUMBER,
  -- ODCIAggregateDelete
  MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_impl, value matt_ratio_to_report_rec) RETURN NUMBER  
);

/

CREATE OR REPLACE TYPE BODY matt_ratio_to_report_impl IS

CONSTRUCTOR FUNCTION matt_ratio_to_report_impl(SELF IN OUT NOCOPY matt_ratio_to_report_impl ) RETURN SELF AS RESULT IS
BEGIN
  SELF.window_sum := 0;
  matt_ratio_to_report_state.G_VALUES_TAB.DELETE;
  matt_ratio_to_report_state.G_INDEX_TAB.DELETE;
  matt_ratio_to_report_state.G_ITERATOR_POSITION := 0;
  matt_ratio_to_report_state.G_TERMINATOR_POSITION := 0;  
  RETURN;
END;


STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_impl) RETURN NUMBER IS
BEGIN
  DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize(' || sctx.window_sum);
  sctx := matt_ratio_to_report_impl ();
  RETURN ODCIConst.Success;
END;


MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.window_sum || ' - ' || value.key || ', ' || value.value || ')');

  -- Increment sum
  self.window_sum := self.window_sum + value.value;

  matt_ratio_to_report_state.G_ITERATOR_POSITION := matt_ratio_to_report_state.G_ITERATOR_POSITION + 1;
  matt_ratio_to_report_state.G_VALUES_TAB(matt_ratio_to_report_state.G_ITERATOR_POSITION) := value;
  matt_ratio_to_report_state.G_INDEX_TAB(value.key) := matt_ratio_to_report_state.G_ITERATOR_POSITION;

  RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_impl, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.window_sum || ' - flags: ' || flags || ')');
  IF flags = 1 THEN
    matt_ratio_to_report_state.G_TERMINATOR_POSITION := matt_ratio_to_report_state.G_TERMINATOR_POSITION + 1;
    returnValue := matt_ratio_to_report_state.G_VALUES_TAB( matt_ratio_to_report_state.G_TERMINATOR_POSITION).value / self.window_sum; --self.x_list;
  END IF;
  RETURN ODCIConst.Success;
EXCEPTION
  WHEN others THEN 
    DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
    RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_impl, ctx2 IN matt_ratio_to_report_impl) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')');
  -- Increment sums
  self.window_sum := self.window_sum + ctx2.window_sum;

  RETURN ODCIConst.Success;
END;

-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.window_sum || ' - ' || value.key || ', ' || value.value || ')');
  -- Decrement sums
  matt_ratio_to_report_state.G_VALUES_TAB.DELETE(matt_ratio_to_report_state.G_INDEX_TAB(value.key));
  matt_ratio_to_report_state.G_INDEX_TAB.DELETE(value.key);

  self.window_sum := self.window_sum - value.value;
END;  

END;
/

CREATE OR REPLACE FUNCTION matt_ratio_to_report ( input matt_ratio_to_report_rec) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_impl;
/


CREATE TABLE matt_test_data ( x ) PARALLEL 4
AS SELECT rownum FROM DUAL CONNECT BY ROWNUM <= 50000;

with test as (
select d.x, sum(d.x) over ( partition by mod(d.x,5) order by d.x desc ) running_sum,
ratio_to_report(d.x) over ( partition by mod(d.x,500) ) oracle_rr,
matt_ratio_to_report( matt_ratio_to_report_rec(to_char(d.x), d.x) ) over ( partition by mod(d.x,500) ) custom_rr
--matt_ratio_to_report( matt_ratio_to_report_rec(to_char(d.x), d.x) ) over ( partition by mod(d.x,500) ORDER BY d.x ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) custom_rr_windowed 
from matt_test_data d )
SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test 
--where oracle_rr != custom_rr
ORDER BY test_results nulls last, x
;
Zachary answered 29/9, 2015 at 18:47 Comment(3)
This is exciting. I don't have time to check it out just yet, but if you're replicating native ratio_to_report, this might be the ticket! The caveats are definitely relevant, since one of the motivations is to enable parallelism.Joshi
Thanks for the upvote. I'm actually working on an alternate, more reliable way. (As I said, I have the same need that you do). If it pans out, I'll share it.Zachary
The other approach does seem to work. I've posted it as a second answer.Zachary
M
0

The Data Cartridge Developer's Guide covers these topics. This section discusses user-defined analytic functions.

Miyamoto answered 25/8, 2015 at 18:22 Comment(1)
The section you mentioned provides information on how to create aggregate functions like SUM (which generate one single value based on a set of values), and provides information on how to optimize such a custom aggregate function for use in an analytic context, such as SELECT SUM(v) OVER (PARTITION BY x)), but it doesn't address how to create a custom analytic functions such as DENSE_RANK or RATIO_TO_REPORT (which generate as many potentially distinct values as they consume).Joshi

© 2022 - 2024 — McMap. All rights reserved.