Finding rows with consecutive increase in the values of a column
Asked Answered
L

4

9

I have a sql table that stores the daily prices of stocks. New records are inserted every day after the market closes. I want to find the stocks that have consecutive increases in price.

The table has lots of columns, but this is the relevant subset:

quoteid     stockid      closeprice     createdate
--------------------------------------------------
    1           1               1       01/01/2012
    2           2              10       01/01/2012
    3           3              15       01/01/2012

    4           1               2       01/02/2012
    5           2              11       01/02/2012
    6           3              13       01/02/2012

    7           1               5       01/03/2012
    8           2              13       01/03/2012
    9           3              17       01/03/2012

   10           1               7       01/04/2012
   11           2              14       01/04/2012
   12           3              18       01/04/2012

   13           1               9       01/05/2012
   14           2              11       01/05/2012
   15           3              10       01/05/2012

The quoteid column is a primary key.

In the table, the closing price of stock id 1 increases every day. Stock id 3 fluctuates a lot, and the price for stock id 2 fell on the last day.

I am looking for a result like this:

stockid     Consecutive Count (CC)
----------------------------------
    1                5
    2                4

If you can get output with dates for the consecutive streak, that would even better:

stockid     Consecutive Count (CC)      StartDate      EndDate
---------------------------------------------------------------
    1                5                 01/01/2012    01/05/2012
    2                4                 01/01/2012    01/04/2012

StartDate is when the price started increasing and EndDate is when the bull run actually finished.

I have figured this is not an easy problem. I have looked at other posts here which also deal with this consecutive scenario but they don't fit my needs. If you know any post that is similar to mine, please do let me know.

Letta answered 27/4, 2012 at 16:32 Comment(3)
What do you want your minimum-length consecutive increases to be - just greater than a single day? Or somehow offset by their decreases? And I assume you want to see multiple runs, if there's data for that.Pantile
Are there any gaps in the data - like on weekends - and what needs to be done there?Pantile
I have no rules for consecutive increases it just have to be greater than previous day. Yes, I am looking for multiple runs. I will be running this query over data from last 3 months, 6 months or may be more than that. There will be gaps in the data, we can use the primary key column to fetch previous day recordLetta
P
11

In any case, it helps to put it in terms of increasing rows-per-stock (the actual quoteid value isn't really helpful here). Count of days captured (in this table) is easiest - if you want something else (like only business days, ignoring weekends/holidays, or whatever) it gets more involved; you'd probably need a calendar file. You're going to want an index over [stockid, createdate], if you don't have one already.

WITH StockRow AS (SELECT stockId, closePrice, createdDate,
                         ROW_NUMBER() OVER(PARTITION BY stockId 
                                           ORDER BY createdDate) rn
                  FROM Quote),

     RunGroup AS (SELECT Base.stockId, Base.createdDate,
                         MAX(Restart.rn) OVER(PARTITION BY Base.stockId
                                              ORDER BY Base.createdDate) groupingId
                  FROM StockRow Base
                  LEFT JOIN StockRow Restart
                         ON Restart.stockId = Base.stockId
                            AND Restart.rn = Base.rn - 1
                            AND Restart.closePrice > Base.closePrice)

SELECT stockId, 
       COUNT(*) AS consecutiveCount, 
       MIN(createdDate) AS startDate, MAX(createdDate) AS endDate
FROM RunGroup
GROUP BY stockId, groupingId
HAVING COUNT(*) >= 3
ORDER BY stockId, startDate

Which yields the following results from the provided data:

Increasing_Run
stockId   consecutiveCount  startDate    endDate
===================================================
1         5                 2012-01-01   2012-01-05
2         4                 2012-01-01   2012-01-04
3         3                 2012-01-02   2012-01-04

SQL Fiddle Example
(Fiddle also has an example for multiple runs)

This analysis will ignore all gaps, correctly matches all runs (the next time a positive run starts).


So what's going on here?

StockRow AS (SELECT stockId, closePrice, createdDate,
                    ROW_NUMBER() OVER(PARTITION BY stockId 
                                      ORDER BY createdDate) rn
             FROM Quote)

This CTE is being used for one purpose: we need a way to find the next/previous row, so first we number each row in order (of the date)...

RunGroup AS (SELECT Base.stockId, Base.createdDate,
                    MAX(Restart.rn) OVER(PARTITION BY Base.stockId
                                         ORDER BY Base.createdDate) groupingId
             FROM StockRow Base
             LEFT JOIN StockRow Restart
                    ON Restart.stockId = Base.stockId
                       AND Restart.rn = Base.rn - 1
                           AND Restart.closePrice > Base.closePrice)

... and then join them based on the index. If you end up on something that has LAG()/LEAD(), using those instead will almost certainly be a better option. There's one critical thing here though - matches are only if the row is out-of-sequence (less than the previous row). Otherwise, the value end up being null (with LAG(), you'd need to use something like CASE afterwards to pull this off). You get a temporary set that looks something like this:

B.rn   B.closePrice   B.createdDate  R.rn   R.closePrice   R.createdDate  groupingId
1      15             2012-01-01     -      -              -              -
2      13             2012-01-02     1      15             2012-01-01     1
3      17             2012-01-03     -      -              -              1
4      18             2012-01-04     -      -              -              1
5      10             2012-01-05     4      18             2012-01-04     4

... So there's values for Restart only when the previous was greater than the "current" row. The use of MAX() in the window function is being used to the greatest value seen so far... which because null is lowest, causes the row-index to be retained for all other rows until another mismatch occurs (which gives a new value). At this point, we essentially have the intermediate results of a query, ready for the final aggregation.

SELECT stockId, 
       COUNT(*) AS consecutiveCount, 
       MIN(createdDate) AS startDate, MAX(createdDate) AS endDate
FROM RunGroup
GROUP BY stockId, groupingId
HAVING COUNT(*) >= 3
ORDER BY stockId, startDate

The final part of the query is getting the start and end dates of the run, and counting the number of entries between those dates. If there was something more complicated for the date calculation, it probably needs to happen at this point. The GROUP BY is showing one of the few legitimate instances of not including a column in the SELECT clause. The HAVING clause is used to eliminate runs that are "too short".

Pantile answered 27/4, 2012 at 18:57 Comment(2)
This counts consecutive increase but also consecutive stagnation (where price is not increasing not decreasing.) Can the code be modified to count only consecutive increases?Riggle
I figured it out. We can achieve that by replacing the > in AND Restart.closePrice > Base.closePrice by >=Riggle
L
2

This is the final working SQL as per my needs. Tests reveal it is working correctly. I am using method for CC from @Oran

WITH StockRow (stockId, [close], createdDate, rowNum)
 as
 (
     SELECT stockId,         [close],                   createdDate,
            ROW_NUMBER() OVER(PARTITION BY stockId ORDER BY createdDate)
     FROM dbo.Quote
     where createddate >= '01/01/2012' --Beginning of this year
     ),

     RunStart (stockId, [close], createdDate, runId) as (
     SELECT      a.stockId,       a.[close], a.createdDate,
            ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate)
     FROM StockRow as a
     LEFT JOIN StockRow as b
     ON b.stockId = a.stockId
     AND b.rowNum = a.rowNum - 1
     AND b.[close] < a.[close]
     WHERE b.stockId IS NULL)
     ,
 RunEnd (stockId, [close], createdDate, runId) as (
     SELECT a.stockId, a.[close], a.createdDate,
            ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate)
     FROM StockRow as a
     LEFT JOIN StockRow as b
     ON b.stockId = a.stockId
     AND b.rowNum = a.rowNum + 1
     AND b.[close] > a.[close]
     WHERE b.stockId IS NULL) 

SELECT a.stockId,        s.companyname,         s.Symbol, 
a.createdDate as startdate,        b.createdDate as enddate,
(select count(r.createdDate)       from      dbo.quote r      where r.stockid = b.stockid and        r.createdDate          between  a.createdDate      and       b.createdDate) as BullRunDuration
FROM RunStart as a JOIN RunEnd as b
ON b.stockId = a.stockId
join dbo.stock as s
on a.stockid = s.stockid
AND b.runId = a.runId
AND b.[close] > a.[close]
and (select count(r.createdDate) from dbo.quote r where r.stockid = b.stockid and 
r.createdDate between  a.createdDate and b.createdDate)  > 2 -- trying to avoid cluter
order by 6 desc, a.stockid
Letta answered 28/4, 2012 at 16:19 Comment(0)
W
0

I would try a CTE, roughly like:

with increase (stockid, startdate, enddate, cc) as
(
    select d2.stockid, d1.createdate as startdate, d2.createdate as enddate, 1
    from quote d1, quote d2
    where d1.stockid = d2.stockid
    and d2.closedprice > d1.closedprice
    and dateadd(day, 1, d1.createdate) = d2.createdate

    union all

    select d2.stockid, d1.createdate as startdate, cend.enddate as enddate, cend.cc + 1
    from quote d1, quote d2, increase cend
    where d1.stockid = d2.stockid and d2.stockid = cend.stockid
    and d2.closedprice > d1.closedprice
    and d2.createdate = cend.startdate
    and dateadd(day, 1, d1.createdate) = d2.createdate
)
select o.stockid, o.cc, o.startdate, o.enddate
from increase o where cc = (select max(cc) from increase i where i.stockid = o.stockid and i.enddate = o.enddate)

This assumes no gaps. The criteria dateadd(day, 1, d1.createdate) = d2.createdate would have to be replaced by something else that indicates whether d2 is the "next" day after d1.

Wabble answered 27/4, 2012 at 17:7 Comment(0)
A
0

simple Solution:

with cte as (
select *,lead(closeprice,1) over(partition by stockid order by stockid,createdate) le,
lag(closeprice,1) over(partition by stockid order by stockid,createdate) lg 
from stock 
)
select stockid,count(*),min(createdate),max(createdate) from(
select * from cte 
--where stockid = 2
where (closeprice < le or closeprice > lg)
)as a group by stockid
Anemometer answered 11/4, 2022 at 7:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.