Using GROUP BY with FIRST_VALUE and LAST_VALUE
Asked Answered
F

3

48

I'm working with some data that is currently stored in 1 minute intervals that looks like this:

CREATE TABLE #MinuteData
    (
      [Id] INT ,
      [MinuteBar] DATETIME ,
      [Open] NUMERIC(12, 6) ,
      [High] NUMERIC(12, 6) ,
      [Low] NUMERIC(12, 6) ,
      [Close] NUMERIC(12, 6)
    );

INSERT  INTO #MinuteData
        ( [Id], [MinuteBar], [Open], [High], [Low], [Close] )
VALUES  ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ),
        ( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ),
        ( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ),
        ( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ),
        ( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ),
        ( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ),
        ( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ),
        ( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ),
        ( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ),
        ( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 );

SELECT  *
FROM    #MinuteData;

DROP TABLE #MinuteData;

The values track currency exchange rates, so for each minute interval (bar), there is the Open price as the minute started and a Close price for the minute end. The High and Low values represent the highest and lowest rate during each individual minute.

Desired Output

I'm looking to reformat this data in to 5 minute intervals to produce the following output:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710

This takes the Open value from the first minute of the 5, the Close value from the last minute of the 5. The High and Low values represent the highest high and lowest low rates across the 5 minute period.

Current Solution

I have a solution that does this (below), but it feels inelegant as it relies on id values and self joins. Also, I intend to run it on much larger datasets so I was looking to do it in a more efficient manner if possible:

-- Create a column to allow grouping in 5 minute Intervals
SELECT  Id, MinuteBar, [Open], High, Low, [Close], 
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval
INTO    #5MinuteData
FROM    #MinuteData
ORDER BY minutebar

-- Group by inteval and aggregate prior to self join
SELECT  Interval ,
        MIN(MinuteBar) AS MinuteBar ,
        MIN(Id) AS OpenId ,
        MAX(Id) AS CloseId ,
        MIN(Low) AS Low ,
        MAX(High) AS High
INTO    #DataMinMax
FROM    #5MinuteData
GROUP BY Interval;

-- Self join to get the Open and Close values
SELECT  t1.Interval ,
        t1.MinuteBar ,
        tOpen.[Open] ,
        tClose.[Close] ,
        t1.Low ,
        t1.High
FROM    #DataMinMax t1
        INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
        INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;

DROP TABLE #DataMinMax
DROP TABLE #5MinuteData

Rework Attempt

Instead of the above queries, I've been looking at using FIRST_VALUE and LAST_VALUE, as it seems to be what I'm after, but I can't quite get it working with the grouping that I'm doing. There might be a better solution than what I'm trying to do, so I'm open to suggestions. Currently I'm trying to do this:

SELECT  MIN(MinuteBar) MinuteBar5 ,
        FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening,
        MAX(High) AS High ,
        MIN(Low) AS Low ,
        LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing ,
        DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval
FROM    #MinuteData
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5

This gives me the below error, which is related to the FIRST_VALUE and LAST_VALUE as the query runs if I remove those lines:

Column '#MinuteData.MinuteBar' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Freitas answered 24/1, 2017 at 23:28 Comment(4)
The first_value and last_value aren't actually aggregate functions like you might think. They're more like row_number, where they're windowed over a full data set. Problem is you're trying to use them like aggregates, and that's why it's yelling at you. I have to head out right now, but my first thought was to cast the date to a string, substring out the minute component and tape it back together in its rounded form.Nadaha
thanks for the response, I haven't used first_value in anger before. the dates aren't really my problem here, I have a solution for that which seems to be working, although there could be better ways to do that. the main issue is getting the open and close values for the 5 minute periods.Freitas
so High and Low is simply the highest high and the lowest low respectively, but open and close are what you're having trouble with because those should just be the first and last in the interval, regardless of their value? Do I have that right?Nadaha
@Nadaha yeah that's correct. My current solution works, but I was hoping to do it in a better way. There may be a better way than what I'm trying and I'm open to suggestions.Freitas
S
44
SELECT 
    MIN(MinuteBar) AS MinuteBar5,
    Opening,
    MAX(High) AS High,
    MIN(Low) AS Low,
    Closing,
    Interval
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
           FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
           DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
           *
    FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

A solution close to your current one. There are two places you did wrong.

  1. FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.

  2. LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window

    LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
                ORDER BY MinuteBar 
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,
    
Stilliform answered 25/1, 2017 at 9:47 Comment(6)
Nice. I didn't think about adding Opening and Closing to the group by clause.Zetes
This seems like the tidiest solution that doesn't require as many steps as the others and is closest to what I was trying to achieve although I'm still looking at why the LAST_VALUE part doesn't work how I expected. Nonetheless, this works so thanks.Freitas
I would recommend replacing DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 with DATEPART(MINUTE, MinuteBar)/5. Other then that, that's a solid solution. +1.Zetes
@ZoharPeled in fairness that's my code that he's copied. Although your solution will work for this dataset, it wouldn't work across larger datasets. When I use this in reality I will be running it across maybe 10 years worth of data. BTW Thanks for answering.Freitas
+1 For the following (important) clarification "(...) a default window is rows from the first row of current partition to current row"Mydriatic
nice explanation on why last_value doesn't work as expected with default windowAddress
Z
4

Here is one way to do it without temporary tables:

;WITH CTEInterval AS 
(  -- This replaces your first temporary table (#5MinuteData)
    SELECT  [Id], 
            [MinuteBar], 
            [Open], 
            [High], 
            [Low], 
            [Close],
            DATEPART(MINUTE, MinuteBar)/5 AS Interval
    FROM #MinuteData
), CTEOpenClose as 
( -- this is instead of your second temporary table (#DataMinMax)
    SELECT  [Id], 
            [MinuteBar], 
            FIRST_VALUE([Open]) OVER (PARTITION BY Interval ORDER BY MinuteBar) As [Open],
            [High],
            [Low], 
            FIRST_VALUE([Close]) OVER (PARTITION BY Interval ORDER BY MinuteBar DESC) As [Close],
            Interval
    FROM CTEInterval
)

-- This is the final select
SELECT  MIN([MinuteBar]) as [MinuteBar], 
        AVG([Open]) as [Open], -- All values of [Open] in the same interval are the same...
        AVG([Close]) as [Close],  -- All values of [Close] in the same interval are the same...
        MIN([Low]) as [Low], 
        MAX([High]) as [High]
FROM CTEOpenClose
GROUP BY Interval

Results:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710
Zetes answered 25/1, 2017 at 9:27 Comment(1)
Thanks I'll test this shortly, currently in a meeting. I was hoping to go reduce the number of steps and I want something that will perform well over approximately 600,000+ records. Will respond later once I've testedFreitas
M
2

Demo here

;with cte
as
(--this can be your permanent table with intervals ,rather than generating on fly
select cast('2015-01-01 17:00:00.000' as datetime) as interval,dateadd(mi,5,'2015-01-01 17:00:00.000') as nxtinterval
union all
select dateadd(mi,5,interval),dateadd(mi,5,nxtinterval) from cte
where interval<='2015-01-01 17:45:00.000'

)
,finalcte
as
(select minutebar,
low,high,
dense_rank() over (order by  interval,nxtinterval) as grpd,
last_value([close]) over ( partition by interval,nxtinterval order by interval,nxtinterval) as [close],
first_value([open]) over (partition by interval,nxtinterval order by interval,nxtinterval) as [open]
 from cte c
join
#minutedata m
on m.minutebar between interval and nxtinterval
)
select 
min(minutebar) as minutebar,
min(low) as 'low',
max(high) as 'High',
max([open]) as 'open',
max([close]) as 'close'
 from finalcte
 group by grpd
Merkle answered 25/1, 2017 at 10:39 Comment(2)
how did you get the "Demo here" button?Nadaha
@Xedni: use <kbd> keep some text </kbd>Merkle

© 2022 - 2024 — McMap. All rights reserved.