Background
I know that in Oracle it's possible to create custom aggregate functions that process a collection of values and return a single result. Edit: I've even read the friendly manual at docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm !
I also know that Oracle provides built-in analytic functions like DENSE_RANK
and RATIO_TO_REPORT
, which provide values for each input, relative to a collection/window of values that input lies within.
Problem
What I want to know is if there's a way to create my own analytic function, presumably in a similar manner as I can create my own aggregation function, and in particular create one with additional arguments in my custom analytic function.
Subtle terminological caveat
When I refer to an "analytic function", please read it as a function which, in addition to accepting windowing parameters via the PARTITION
keyword, also can return different values within a given window. (If anyone has a better term for this, please let me know! Pure analytic function? DENSE_RANK
-class analytic function? Non-aggregate analytic function?)
The Oracle documentation notes that an aggregate function can be used as an analytic (windowed) function. Unfortunately, this only means that the PARTITION
keyword for specifying windowing in analytic functions can be applied to aggregate functions as well. It doesn't promote the aggregate function to my coveted status of being able to return different values within a fixed window.
Aggregate used as analytic:
SELECT SUM(income) OVER (PARTITION BY first_initial) AS total FROM data;
will have as many records as data
, but it will only have as many distinct total
s as there are first initials.
Analytic used as analytic:
SELECT RATIO_TO_REPORT(income) OVER (PARTITION BY first_initial) AS ratio FROM data;
will have as many records as data
, AND, even within a given first_initial
partition, those ratio
s may all be distinct.
Context
I have been provided with call-only access to a PL/SQL procedure which accepts a numeric collection as an IN OUT parameter, and which has a few other IN configuration parameters. The procedure modifies the collection's values (think of it like "The University's Proprietary Sanctioned and Required Grade Curving Procedure") in a manner influenced by the configuration parameters.
Currently, the process to use the procedure is to hardcode a cursor loop that detects a change from one data partition to another, and then within each partition fetches data into a collection which is then passed to the procedure, altered, and eventually dumped back out into a separate table. I planned to improve this by making a PIPELINED
PARALLEL_ENABLE
table function that encapsulates some of the logic, but I'd much prefer to enable queries like the following:
SELECT G.Course_ID
, G.Student_ID
, G.Raw_Grade
, analytic_wrapper(G.raw_grade, P.course_config_data)
OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade
, P.course_config_data
FROM grades G
LEFT JOIN policies P
ON G.Course_ID = P.Course_ID;
That requires being able to create a custom analytic function, though, and due to the way the procedure requires different inputs on different partitions (e.g. the Course_ID
-specific P.course_config_data
above) it also has to accept not only the data-to-be-aggregated argument, but also additional inputs.
Is this possible, and if so where could I find documentation? My Google-fu has failed me.
An Additional Wrinkle
The PL/SQL procedure I've been provided with is (effectively) non-deterministic, and its output has statistical properties which must be preserved. For example, if A={A[0], A[1], A[3]}
are the raw grades for one particular class, and B=f(A)
is the result of invoking the procedure on A
at 1:00 while C=f(A)
is the result of invoking the procedure on A
at 1:15, then B={B[0],B[1],B[2]}
and C={C[0],C[1],C[2]}
are both acceptable outputs to use, but a mixture of the elements like {C[0],B[1],C[2]}
is not acceptable.
The upshot of this is that the procedure must be called exactly once on each partition. (Well, technically, it can be wastefully called as many times as one wants, but all results for a partition must come from the same invocation).
Supposed, for example, that the procedure I've been supplied with operates as follows: It accepts a collection of grades as an IN OUT
parameter, and then sets one of those grades, chosen at random, to 100. All the other grades are set to zero. Running this at 1pm might result in Alice having the only passing grade, while running it at 1:01pm might result in Bob having the only passing grade. Regardless, it should be the case that exactly one student per class passes, no more and no less.
matt_
names: they make it clear what's built-in vs what you're defining! – Joshi