How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
Asked Answered
M

5

40

I am trying to fetch the first and the last record of a 'grouped' record.
More precisely, I am doing a query like this

SELECT MIN(low_price), MAX(high_price), open, close
FROM symbols
WHERE date BETWEEN(.. ..)
GROUP BY YEARWEEK(date)

but I'd like to get the first and the last record of the group. It could by done by doing tons of requests but I have a quite large table.

Is there a (low processing time if possible) way to do this with MySQL?

Mckenney answered 4/9, 2009 at 14:20 Comment(1)
For more efficiency, see mysql.rjweb.org/doc.php/groupwise_maxDric
M
66

You want to use GROUP_CONCAT and SUBSTRING_INDEX:

SUBSTRING_INDEX( GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime), ',', 1 ) AS open
SUBSTRING_INDEX( GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC), ',', 1 ) AS close 

This avoids expensive sub queries and I find it generally more efficient for this particular problem.

Check out the manual pages for both functions to understand their arguments, or visit this article which includes an example of how to do timeframe conversion in MySQL for more explanations.

Marion answered 4/9, 2009 at 14:20 Comment(7)
Thanks for the crafty solution ! Still, I find it unfortunate that MySQL doesn't support FIRST() and LAST(), which would be much faster than this...Detonator
Excellent solution. I wondered about performance and memory considerations on large tables until I saw that the operation is confined to the size defined by group_concat_max_len (default 1024). Good times!Hilariohilarious
The performance of all subqueries is not the same. It is so obvious it is embarrassing to have to say it, but it is heavily dependant on the subquery and the query it is imbedded in. And un-correllated subqueries, (where the execution of the subquery is not dependant on each row of the outer query) is no worse (or better) than it would be when run on its own. As the subquery in my solution below is...Gaiseric
Best solution for my problem and I looked a lot! Thanks! Avoids nasty subqueries or self-joins.Nikolaus
could you write the full query? ThanksDecreasing
See the linked article for a full example.Marion
The article is down.Perfectionist
G
2

Try This to start with... :

Select YearWeek, Date, Min(Low_Price), Max(High_Price)
From
   (Select YEARWEEK(date) YearWeek, Date, LowPrice, High_Price
    From Symbols S
    Where Date BETWEEN(.. ..)
    GROUP BY YEARWEEK(date)) Z
Group By YearWeek, Date
Gaiseric answered 4/9, 2009 at 14:30 Comment(0)
M
0

Here is a great specific solution to this specific problem: http://topwebguy.com/first-and-last-in-mysql-a-working-solution/ It's almost as simple as using FIRST and LAST in MySQL.

I will include the code that actually provides the solution but you can look upi the whole text:

SELECT
word ,  

(SELECT a.ip_addr FROM article a
WHERE a.word = article.word
ORDER BY a.updated  LIMIT 1) AS first_ip,

(SELECT a.ip_addr FROM article a
WHERE a.word = article.word
ORDER BY a.updated DESC LIMIT 1) AS last_ip

FROM notfound GROUP BY word;
Margartmargate answered 17/4, 2010 at 7:43 Comment(0)
S
0

I usually achieve this by joins back onto the table as this gives me access to all data for the two rows.

This example use order by and limit, but you can also use min and max on the primary key returned in the subqueries of the joins.

This is assuming that table has a primary key column called ID.

SELECT MIN(symbols.low_price), MAX(symbols.high_price), symbols.open, symbols.close,

symbols.id,
symbols.date,

symbols_prev.id symbols_prev_id,
symbols_prev.date symbols_prev_date,
symbols_prev.low_price symbols_prev_low_price,
symbols_prev.high_price symbols_prev_high_price,

symbols_next.id symbols_next_id,
symbols_next.date symbols_next_date,
symbols_next.low_price symbols_next_low_price,
symbols_next.high_price symbols_next_high_price

FROM symbols

JOIN symbols symbols_prev ON
 symbols_prev.ID = 
(
SELECT symbols_prev_inner.ID
FROM symbols symbols_prev_inner
WHERE YEARWEEK(symbols_prev_inner.date)=YEARWEEK(symbols.date) 
AND symbols_prev_inner.ID<symbols.ID
ORDER BY
symbols_prev_inner.ID DESC
LIMIT 1
)

JOIN symbols symbols_next ON
 symbols_next.ID = 
(
SELECT symbols_next_inner.ID
FROM symbols symbols_next_inner
WHERE YEARWEEK(symbols_next_inner.date)=YEARWEEK(symbols.date) 
AND symbols_next_inner.ID>symbols.ID
ORDER BY
symbols_next_inner.ID
LIMIT 1
)
WHERE symbols.date BETWEEN(.. ..)
GROUP BY YEARWEEK(symbols.date)
Sputnik answered 17/11, 2023 at 12:12 Comment(0)
H
-1

Assuming that you want the ids of the records with the lowest low_price and the highest high_price you could add these two columns to your query,

SELECT 

(SELECT id ORDER BY low_price ASC LIMIT 1) low_price_id,
(SELECT id ORDER BY high_price DESC LIMIT 1) high_price_id,

MIN(low_price), MAX(high_price), open, close
FROM symbols
WHERE date BETWEEN(.. ..)
GROUP BY YEARWEEK(date)

If efficiency is an issue you should add a column for 'year_week', add some covering indexes, and split the query in two.

The 'year_week' column is just an INT set to the value of YEARWEEK(date) and updated whenever the 'date' column is updated. This way you don't have to recalculate it for each query and you can index it.

The new covering indexes should look like this. The ordering is important. KEY yw_lp_id (year_week, low_price, id), KEY yw_hp_id (year_week, high_price, id)

You should then use these two queries

SELECT 
(SELECT id ORDER BY low_price ASC LIMIT 1) low_price_id,
MIN(low_price), open, close
FROM symbols
WHERE year_week BETWEEN(.. ..)
GROUP BY year_week

and

SELECT 
(SELECT id ORDER BY high_price DESC LIMIT 1) high_price_id,
MAX(high_price), open, close
FROM symbols
WHERE year_week BETWEEN(.. ..)
GROUP BY year_week

Covering indexes are pretty useful. Check this out for more details.

Heighttopaper answered 6/9, 2009 at 5:22 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.