Simple way to calculate median with MySQL
Asked Answered
E

50

270

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.

Example data:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

Sorting on val gives 2 2 3 4 7 8 9, so the median should be 4, versus SELECT AVG(val) which == 5.

Electrostatics answered 18/8, 2009 at 0:13 Comment(1)
MariaDB since version 10.3 has one, see mariadb.com/kb/en/library/medianBovill
I
271

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

Inestimable answered 31/8, 2011 at 21:53 Comment(9)
any way to make it to show group values? like: place / median for that place... like select place, median_value from table... any way? thanksAldenalder
@rowNum will have the 'total count' at the end of the execution. So you can use that if you want to avoid having to do a 'count all' again (which was my case because my query wasn't so simple)Jannery
The logic of having one statement: ( floor((total_rows+1)/2), floor((total_rows+2)/2) ) calculate the rows needed for the median is awesome! Not sure how you thought of that, but it is brilliant. The part I don't follow is the (SELECT @rownum:=0) r -- what purpose does this serve?Pendant
change the first WHERE 1 to WHERE d.val IS NOT NULL so that it excludes NULL rows to keep this method aligned with the native AVGThessalonian
My value came from a two-table join, so I had to add another subquery in order to make sure the row ordering was correct after the join! The structure was sort of select avg(value) from (select value, row_number from (select a - b as value from a_table join b_table order by value))Overglaze
I know this is very old but for some reason this produces very different results than just moving set @rn:=-1 to the outer select instead of instantiating at 0 inside the inner select. For some reason I could not get the results to matchWesson
Maybe you can solve it using GROUP_CONCAT (pay attention to the function limitations on group_concat_max_len system variable). SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(value order by value), ',', CEIL(COUNT(0)/2)), ',', -1)Overprize
QUERY :: I changed the last WHERE clause to WHERE rownumber IN (FLOOR((rownumber + 1)/2), FLOOR((rownumber + 2)/2)); as it was mentioned that rownumber will have total number of rows after first run. This gives me a different result. Can anyone explain why and also why we introduced two variables instead of one.Savonarola
Now, I want to be able to use the above logic as in median(column) - is it possible to embed the above as a function / procedure?Chayachayote
T
78

I just found another answer online in the comments:

For medians in almost any SQL:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.

select count(*) from table --find the number of rows

Calculate the "median" row number. Maybe use: median_row = floor(count / 2).

Then pick it out of the list:

select val from table order by val asc limit median_row,1

This should return you one row with just the value you want.

Ten answered 18/8, 2009 at 1:2 Comment(5)
@rob can you help edit please? Or should I just bow down to the velcrow solution? (not actually sure how to defer to another solution) Thanks, JacobTen
Note that it does a "cross join", which is very slow for large tables.Marlie
This answer returns nothing for even number of rows.Fourlegged
This answer doesn't work at all for some data sets, e.g., the trivial data set with values 0.1, 0.1, 0.1, 2 -- it will work if all the values are distinct, but only works if the valuesHoggard
why not simplify the last line to HAVING SUM(SIGN(y.lat_n-x.lat_n)) = 0 ?Kylix
D
39

I found the accepted solution didn't work on my MySQL install, returning an empty set, but this query worked for me in all situations that I tested it on:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
Deprived answered 4/6, 2012 at 6:59 Comment(8)
absolutely correct, works perfectly and very speedy on my indexed tablesAchaemenid
this seems to be the fastest solution on mysql out of all the answers here, 200ms with just short of a million records in the tableAchaemenid
I am a front-end designer with only a basic knowledge of MySQL, and am having a problem with the syntax. After 'FROM' I've only seen come one variable, the name of the table. Does this formula select data from two tables, and if so, how would the formula be if just the median of one data column of one table is required?Wien
@FrankConijn: It selects from one table twice. The table's name is data and it is being used with two names, x and y.Bolten
just saying i stalled my mysqld with this exact query on a table with 33k rows...Iridic
does not work for unsigned columns, BIGINT UNSIGNED value is out of range in '(`db`.`y`.`val` - `db`.`x`.`val`)'Thessalonian
This query returns wrong answer for even number of rows.Fourlegged
Many thanks for your query, but can you please explain your query of Having part?Libration
D
34

Unfortunately, neither TheJacobTaylor's nor velcrow's answers return accurate results for current versions of MySQL.

Velcro's answer from above is close, but it does not calculate correctly for result sets with an even number of rows. Medians are defined as either 1) the middle number on odd numbered sets, or 2) the average of the two middle numbers on even number sets.

So, here's velcro's solution patched to handle both odd and even number sets:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

To use this, follow these 3 easy steps:

  1. Replace "median_table" (2 occurrences) in the above code with the name of your table
  2. Replace "median_column" (3 occurrences) with the column name you'd like to find a median for
  3. If you have a WHERE condition, replace "WHERE 1" (2 occurrences) with your where condition
Discordant answered 21/5, 2013 at 22:3 Comment(1)
And, what do you do for the Median of string values?Marlie
I
17

I propose a faster way.

Get the row count:

SELECT CEIL(COUNT(*)/2) FROM data;

Then take the middle value in a sorted subquery:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.

Israelite answered 31/5, 2011 at 0:5 Comment(9)
Why not: SELECT val FROM data ORDER BY val limit @middlevalue, 1Boycie
How do you pull the variable output of your first code block into your second code block?Allonge
As in, where does @middlevalue come from?Allonge
@Boycie - I agree with you, that makes much more sense to me. Did you ever find a reason not to do it that way?Standfast
This does not work as a variable cannot be used in limit clause.Remediless
@Allonge middlevalue == SELECT CEIL(COUNT(*)/2) FROM data;Bonnice
is the taking into account the odd number scenerio ?Supplication
This was by far the most performant way to do this for me, doing this in two queries and keeping the variable as a simple backend variable between queries. Could be wrong if you have fast-updating data that changes the count between the two queries, but that's fine for my case.Lentz
And no, @AbhishekSengupta this doesn't take into account the even/odd logic difference when calculating a true median.Lentz
I
15

Install and use this mysql statistical functions: http://www.xarg.org/2012/07/statistical-functions-in-mysql/

After that, calculate median is easy:

SELECT median(val) FROM data;
Irreproachable answered 12/9, 2014 at 14:18 Comment(1)
I just tried this myself, and for what it's worth, installing it was super fast / easy, and it worked as advertised, including grouping, e.g. "select name, median(x) FROM t1 group by name" -- github source here: github.com/infusion/udf_infusionHoggard
W
11

If MySQL has ROW_NUMBER, then the MEDIAN is (be inspired by this SQL Server query):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

The IN is used in case you have an even number of entries.

If you want to find the median per group, then just PARTITION BY group in your OVER clauses.

Rob

Wilburnwilburt answered 18/8, 2009 at 0:51 Comment(2)
Nope, no ROW_NUMBER OVER, no PARTITION BY, none of that; this is MySql, not a real DB engine like PostgreSQL, IBM DB2, MS SQL Server, and so forth;-).Walli
MySQL has window functions now, so this basically works. The only change you need is that you have to take the average of your results in the end.Burgundy
P
9

A comment on this page in the MySQL documentation has the following suggestion:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1 
Pincenez answered 18/8, 2009 at 0:20 Comment(2)
IMHO, this one is clearly the best for situations where you need the median from a complicated subset(s) (I needed to calculate separate medians of a large number of data subsets)Alpine
Works fine for me. 5.6.14 MySQL Community Server. Table with 11M records (about 20Gb on disk), has two not primary indexes (model_id, price). In table (after filtration) we have 500K records to calculate median for. In result we have 30K records (model_id, median_price). Query duration is 1.5-2 seconds. Speed is Fast for me.Medullated
W
9

Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.

I use this:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
 GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
  ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;

You can replace the "50" in example above to any percentile, is very efficient.

Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:

SET group_concat_max_len = 10485760; #10MB max length

More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/

Wording answered 13/8, 2013 at 11:33 Comment(2)
Be aware: For even number of values it takes the higher of the two middle values. For odds number of values it takes the next higher value after the median.Pompeii
Brilliant use of GROUP_CONCAT that doesn't require multiple subqueries!Id
M
7

I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.

SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE  
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) = 
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
Militiaman answered 13/6, 2017 at 13:17 Comment(2)
I believe this only works with a table that has the number of entries is odd. For even number of entries, this may have a problem.Inclusive
@Y.Chang you are right. This returns nothing for even number of rowsLaughter
W
6

You could use the user-defined function that's found here.

Walli answered 18/8, 2009 at 0:19 Comment(2)
This looks the most useful, but I don't want to install unstable alpha software that may cause mysql to crash onto my production server :(Electrostatics
So study their sources for the function of interest, fix them or modify them as needed, and install "your own" stable and non-alpha version once you've made it -- how's that any worse than similarly tweaking less-proven code suggestions you get on SO?-)Walli
B
5

Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:

SELECT grp_field, t1.val FROM (
   SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number,
   @s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
  FROM data d,  (SELECT @rownum:=0, @s:=0) r
  ORDER BY grp_field, d.val
) as t1 JOIN (
  SELECT grp_field, count(*) as total_rows
  FROM data d
  GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;
Bowden answered 17/3, 2012 at 23:49 Comment(0)
R
3

Takes care about an odd value count - gives the avg of the two values in the middle in that case.

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq
Regain answered 17/11, 2010 at 18:21 Comment(0)
M
3

My code, efficient without tables or additional variables:

SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
Misappropriate answered 23/4, 2013 at 1:7 Comment(2)
This will fail on any substantial amount of data because GROUP_CONCAT is limited to 1023 characters, even when used inside another function like this.Hama
You can adjust the group_concat limit to a quite substantial number of characters, but the criticism is valid. Eventually on some dataset the query will fail.Firing
R
3

Single query to archive the perfect median:

SELECT 
COUNT(*) as total_rows, 
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
AVG(val) as average 
FROM 
data
Referential answered 27/5, 2020 at 10:36 Comment(0)
D
2

Optionally, you could also do this in a stored procedure:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
  -- Set default parameters
  IF where_clause IS NULL OR where_clause = '' THEN
    SET where_clause = 1;
  END IF;

  -- Prepare statement
  SET @sql = CONCAT(
    "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
        (
          SELECT @row:=@row+1 as `row`, x.", column_name, "
          FROM ", table_name," AS x, (SELECT @row:=0) AS r
          WHERE ", where_clause, " ORDER BY x.", column_name, "
        ) AS t1,
        (
          SELECT COUNT(*) as 'count'
          FROM ", table_name, " x
          WHERE ", where_clause, "
        ) AS t2
        -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
        WHERE t1.row >= t2.count/2
          AND t1.row <= ((t2.count/2)+1)) AS t3
    ");

  -- Execute statement
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END//
DELIMITER ;


-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);
Discordant answered 21/5, 2013 at 23:24 Comment(3)
Thanks for this! The user should be aware that missing values (NULL) are considered as values. to avoid this problem add 'x IS NOT NULL where condition.Pompeii
@Pompeii In which line of the code x IS NOT NULL should be added?Waler
@PrzemyslawRemin Sorry, I was not clear in my statement and I realized now that the SP does already consider the case of missing values. The SP should be called in this way: CALL median("table","x","x IS NOT NULL").Pompeii
T
2

My solution presented below works in just one query without creation of table, variable or even sub-query. Plus, it allows you to get median for each group in group-by queries (this is what i needed !):

SELECT `columnA`, 
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;

It works because of a smart use of group_concat and substring_index.

But, to allow big group_concat, you have to set group_concat_max_len to a higher value (1024 char by default). You can set it like that (for current sql session) :

SET SESSION group_concat_max_len = 10000; 
-- up to 4294967295 in 32-bits platform.

More infos for group_concat_max_len: https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len

Troupe answered 18/4, 2014 at 8:45 Comment(0)
F
2

Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).

SELECT Avg(tmp.val) as median_val
    FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
              FROM data as inTab,  (SELECT @rows := -1) as init
              -- Replace with better where clause or delete
              WHERE 2 > 1
              ORDER BY inTab.val) as tmp
    WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));
Fiord answered 12/8, 2014 at 18:30 Comment(0)
A
2

Knowing exact row count you can use this query:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

Where <half> = ceiling(<size> / 2.0) - 1

Albumin answered 2/9, 2014 at 9:45 Comment(0)
S
2
SELECT 
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(field ORDER BY field),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(field)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(field),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        )
FROM
    table

The above seems to work for me.

Sabba answered 22/11, 2016 at 22:2 Comment(1)
It is not returning the correct median for even number of values, For example , the median of {98,102,102,98} is 100 but your code gives 102. It worked fine for odd numbers.Wiggle
S
2

Often, we may need to calculate Median not just for the whole table, but for aggregates with respect to our ID. In other words, calculate median for each ID in our table, where each ID has many records. (good performance and works in many SQL + fixes problem of even and odds, more about performance of different Median-methods https://sqlperformance.com/2012/08/t-sql-queries/median )

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

Hope it helps

Salzhauer answered 21/4, 2017 at 22:6 Comment(1)
It is the best solution. However, for large data sets it will slow down because it re-counts for every item in each set. To make it faster put "COUNT(*)" to separate sub-query.Sinfonietta
G
2

Simple Solution For ORACLE:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

Easy Solution to Understand For MySQL:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

Explanation

STATION is table name. LAT_N is the column name having numeric value

Suppose there are 101 records(odd number) in station table. This means that the median is 51st record if the tabled sorted either asc or desc.

In above query for every S.LAT_N of S table I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if they are matched then I am selecting that S.LAT_N value. When I check for 51st records there are 50 values less than 51st record and there 50 records greater than 51st record. As you see, there are 50 records in both tables. So this is our answer. For every other record there are different number of records in two tables created for comparison. So, only 51st record meets the condition.

Now suppose there are 100 records(even number) in station table. This means that the median is average of 50th and 51st records if the tabled sorted either asc or desc.

Same as odd logic I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if their difference is equal to 1 then I am selecting that S.LAT_N value and find the average. When I check for 50th records there are 49 values less than 50th record and there 51 records greater than 50th record. As you see, there is difference of 1 record in both tables. So this(50th record) is our 1st record for average. Similarly, When I check for 51st records there are 50 values less than 51st record and there 49 records greater than 51st record. As you see, there is difference of 1 record in both tables. So this(51st record) is our 2nd record for average. For every other record there are different number of records in two tables created for comparison. So, only 50th and 51st records meet the condition.

Granville answered 17/6, 2021 at 7:28 Comment(0)
L
2

I am using the below table for the solution in MySQL:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

Calculating Median for 'amount' column:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

This query will work for both the conditions i.e. Even and Odd records.

Ledger answered 26/9, 2021 at 9:14 Comment(0)
I
2

If you need median per group then use "PARTITION BY" in ROW_NUMBER() OVER (...)

WITH Numbered AS 
(
  SELECT groupingColumn, 
  val,
  COUNT(*) OVER (partition by groupingColumn) AS Cnt,
  ROW_NUMBER() OVER (partition by groupingColumn ORDER BY val) AS RowNum
  FROM yourtable
)
SELECT groupingColumn, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
ORDER BY groupingColumn
;
Illative answered 2/7, 2023 at 17:35 Comment(1)
only problem is that it can only calculate the median for one column at a time.Amphichroic
S
1

I used a two query approach:

  • first one to get count, min, max and avg
  • second one (prepared statement) with a "LIMIT @count/2, 1" and "ORDER BY .." clauses to get the median value

These are wrapped in a function defn, so all values can be returned from one call.

If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.

Sofar answered 14/7, 2010 at 18:26 Comment(0)
B
1

as i just needed a median AND percentile solution, I made a simple and quite flexible function based on the findings in this thread. I know that I am happy myself if I find "readymade" functions that are easy to include in my projects, so I decided to quickly share:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {

    $sql = "
            SELECT `t1`.`".$column."` as `percentile` FROM (
            SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
              FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
              ".$where."
              ORDER BY `d`.`".$column."`
            ) as `t1`, 
            (
              SELECT count(*) as `total_rows`
              FROM `".$table."` `d`
              ".$where."
            ) as `t2`
            WHERE 1
            AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
        ";

    $result = sql($sql, 1);

    if (!empty($result)) {
        return $result['percentile'];       
    } else {
        return 0;
    }

}

Usage is very easy, example from my current project:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";

    $reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
    $reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
    $reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...
Bogusz answered 21/6, 2013 at 9:19 Comment(0)
A
1

Here is my way . Of course, you could put it into a procedure :-)

SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);

SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');

PREPARE median FROM @median;

EXECUTE median;

You could avoid the variable @median_counter, if you substitude it:

SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
                      (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
                      ', 1'
                    );

PREPARE median FROM @median;

EXECUTE median;
Angiosperm answered 25/3, 2014 at 17:6 Comment(0)
C
1

After reading all previous ones they didn't match with my actual requirement so I implemented my own one which doesn't need any procedure or complicate statements, just I GROUP_CONCAT all values from the column I wanted to obtain the MEDIAN and applying a COUNT DIV BY 2 I extract the value in from the middle of the list like the following query does :

(POS is the name of the column I want to get its median)

(query) SELECT
SUBSTRING_INDEX ( 
   SUBSTRING_INDEX ( 
       GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';') 
    , ';', COUNT(*)/2 ) 
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial

I hope this could be useful for someone in the way many of other comments were for me from this website.

Closestool answered 28/7, 2014 at 8:33 Comment(1)
I like this solution, it uses native group functions, no extra joins or temporary variables. It can also be extended such that for an even number of values, it returns the average of the middle two values, and for an odd number, it returns the center value: (SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(column_name ORDER BY column_name desc SEPARATOR ';'), ';', FLOOR(COUNT(*)/2+0.5)), ';', -1) + SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(column_name ORDER BY column_name desc SEPARATOR ';'), ';', CEIL(COUNT(*)/2+0.5)), ';', -1)) / 2.0Lemmuela
O
1

Based on @bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.

Think, e.g., median sale price for used cars in a car lot, grouped by year-month.

SELECT 
    period, 
    AVG(middle_values) AS 'median' 
FROM (
    SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
    FROM (
        SELECT 
            @last_period:=@period AS 'last_period',
            @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
            IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
            x.sale_price
          FROM listings AS x, (SELECT @row:=0) AS r
          WHERE 1
            -- where criteria goes here
          ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
        ) AS t1
    LEFT JOIN (  
          SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
          FROM listings x
          WHERE 1
            -- same where criteria goes here
          GROUP BY DATE_FORMAT(sale_date, '%Y%m')
        ) AS t2
        ON t1.period = t2.period
    ) AS t3
WHERE 
    row_num >= (count/2) 
    AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;
Olsen answered 23/11, 2016 at 1:31 Comment(0)
L
1
create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

OR

select cast(avg(id) as float) from 
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id 
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;
Lipman answered 5/3, 2017 at 1:12 Comment(0)
I
1

MySQL has supported window functions since version 8.0, you can use ROW_NUMBER or DENSE_RANK (DO NOT use RANK as it assigns the same rank to same values, like in sports ranking):

SELECT AVG(t1.val) AS median_val
  FROM (SELECT val, 
               ROW_NUMBER() OVER(ORDER BY val) AS rownum
          FROM data) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.row_num IN
       (FLOOR((t2.num_records + 1) / 2), 
        FLOOR((t2.num_records + 2) / 2));
Interventionist answered 30/4, 2020 at 3:13 Comment(0)
L
1

A simple way to calculate Median in MySQL

set @ct := (select count(1) from station);
set @row := 0;

select avg(a.val) as median from 
(select * from  table order by val) a
where (select @row := @row + 1)
between @ct/2.0 and @ct/2.0 +1;
Loy answered 22/10, 2020 at 19:11 Comment(0)
N
1

The most simple and fast way to calculate median in mysql.

select x.col
from   (select lat_n, 
               count(1) over (partition by 'A')        as total_rows, 
               row_number() over (order by col asc) as rank_Order 
        from   station ft) x 
where  x.rank_Order = round(x.total_rows / 2.0, 0) 
Nurse answered 6/6, 2021 at 6:26 Comment(1)
just replace lat_n with col and it is a great working solutionHatching
L
1

You can use window function row_number() to answer the query to find medium

select val 
from (select val, row_number() over (order by val) as rownumber, x.cnt 
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);
Longsuffering answered 23/2, 2022 at 20:15 Comment(0)
K
1

A different way to calculate the Median is using JSON functions in MySQL 5.7+, 8+ and MariaDB 10.2+.

This is my stored function tested in MySQL 8.0:

CREATE FUNCTION JSON_MEDIAN(input_json JSON)
RETURNS FLOAT NO SQL
BEGIN
    DECLARE median FLOAT;
    DECLARE middle INT;
    DECLARE arr_length INT;
    DECLARE peek_count INT;
    
    -- count non-empty items
    SELECT COUNT(*) INTO arr_length
    FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
    WHERE item IS NOT NULL;
    -- peek 1 item if length is odd or 2 items if length is even
    SET peek_count = 2 - arr_length % 2;
    SET middle = CEIL(arr_length / 2) - 1;
    
    SELECT AVG(item) INTO median 
    FROM (
        SELECT item
        FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
        WHERE item IS NOT NULL
        ORDER BY item
        LIMIT middle, peek_count
    ) s2;
    
    RETURN median;
END

You can now use this function with JSON arrays containing numeric items or create the input using the JSON_ARRAYAGG function like this:

SELECT JSON_MEDIAN(JSON_ARRAYAGG(`val`))
FROM `my_table`

This method does not have GROUP_CONCAT limits.

Keikokeil answered 8/7, 2023 at 10:31 Comment(1)
Wonderful solution. Works great on RDS. Can't understand why it hasn't been up-voted by more people!Sterigma
B
0

I have a database containing about 1 billion rows that we require to determine the median age in the set. Sorting a billion rows is hard, but if you aggregate the distinct values that can be found (ages range from 0 to 100), you can sort THIS list, and use some arithmetic magic to find any percentile you want as follows:

with rawData(count_value) as
(
    select p.YEAR_OF_BIRTH
        from dbo.PERSON p
),
overallStats (avg_value, stdev_value, min_value, max_value, total) as
(
  select avg(1.0 * count_value) as avg_value,
    stdev(count_value) as stdev_value,
    min(count_value) as min_value,
    max(count_value) as max_value,
    count(*) as total
  from rawData
),
aggData (count_value, total, accumulated) as
(
  select count_value, 
    count(*) as total, 
        SUM(count(*)) OVER (ORDER BY count_value ROWS UNBOUNDED PRECEDING) as accumulated
  FROM rawData
  group by count_value
)
select o.total as count_value,
  o.min_value,
    o.max_value,
    o.avg_value,
    o.stdev_value,
    MIN(case when d.accumulated >= .50 * o.total then count_value else o.max_value end) as median_value,
    MIN(case when d.accumulated >= .10 * o.total then count_value else o.max_value end) as p10_value,
    MIN(case when d.accumulated >= .25 * o.total then count_value else o.max_value end) as p25_value,
    MIN(case when d.accumulated >= .75 * o.total then count_value else o.max_value end) as p75_value,
    MIN(case when d.accumulated >= .90 * o.total then count_value else o.max_value end) as p90_value
from aggData d
cross apply overallStats o
GROUP BY o.total, o.min_value, o.max_value, o.avg_value, o.stdev_value
;

This query depends on your db supporting window functions (including ROWS UNBOUNDED PRECEDING) but if you do not have that it is a simple matter to join aggData CTE with itself and aggregate all prior totals into the 'accumulated' column which is used to determine which value contains the specified precentile. The above sample calcuates p10, p25, p50 (median), p75, and p90.

-Chris

Bosket answered 17/6, 2015 at 4:53 Comment(0)
L
0

Taken from: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html

I would suggest another way, without join, but working with strings

i did not checked it with tables with large data, but small/medium tables it works just fine.

The good thing here, that it works also by GROUPING so it can return the median for several items.

here is test code for test table:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL

SELECT 'note', 11 UNION ALL

SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26 

and the code for finding the median for each group:

SELECT grp,
         SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
         GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

Output:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11
Leifleifer answered 19/6, 2015 at 12:3 Comment(1)
Don't you think the median of ` {22,26}` should be 24?Wiggle
F
0

In some cases median gets calculated as follows :

The "median" is the "middle" value in the list of numbers when they are ordered by value. For even count sets, median is average of the two middle values. I've created a simple code for that :

$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";

$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
  $even = TRUE;
  $offset++;
  $medianRow--;
}

$medianValue = "SELECT column as median 
               {$fromClause} {$whereClause} 
               ORDER BY median 
               LIMIT {$medianRow},{$offset}";

$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
  if ($even) {
    $midValue = $midValue + $medianValDAO->median;
  }
  else {
    $median = $medianValDAO->median;
  }
}
if ($even) {
  $median = $midValue / 2;
}
return $median;

The $median returned would be the required result :-)

Formication answered 15/7, 2015 at 10:54 Comment(0)
V
0

Medians grouped by dimension:

SELECT your_dimension, avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 AS `row_number`,
   IF(@dim <> d.your_dimension, @rownum := 0, NULL),
   @dim := d.your_dimension AS your_dimension,
   d.val
   FROM data d,  (SELECT @rownum:=0) r, (SELECT @dim := 'something_unreal') d
  WHERE 1
  -- put some where clause here
  ORDER BY d.your_dimension, d.val
) as t1
INNER JOIN  
(
  SELECT d.your_dimension,
    count(*) as total_rows
  FROM data d
  WHERE 1
  -- put same where clause here
  GROUP BY d.your_dimension
) as t2 USING(your_dimension)
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) )

GROUP BY your_dimension;
Viki answered 4/8, 2015 at 16:18 Comment(0)
Y
0

This way seems include both even and odd count without subquery.

SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0
Yeseniayeshiva answered 1/11, 2016 at 4:18 Comment(2)
Could you tell what is the t2 table?Murphree
@Murphree t2 is the alias for the second table being selected (named table in the example). The issue with this method is that it does a cross join, but also, doesn't work on my trivial example: ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in ("t.score" - "t2.score")Sholem
T
0

These methods select from the same table twice. If the source data are coming from an expensive query, this is a way to avoid running it twice:

select KEY_FIELD, AVG(VALUE_FIELD) MEDIAN_VALUE
from (
    select KEY_FIELD, VALUE_FIELD, RANKF
    , @rownumr := IF(@prevrowidr=KEY_FIELD,@rownumr+1,1) RANKR
    , @prevrowidr := KEY_FIELD
    FROM (
        SELECT KEY_FIELD, VALUE_FIELD, RANKF
        FROM (
            SELECT KEY_FIELD, VALUE_FIELD 
            , @rownumf := IF(@prevrowidf=KEY_FIELD,@rownumf+1,1) RANKF
            , @prevrowidf := KEY_FIELD     
            FROM (
                SELECT KEY_FIELD, VALUE_FIELD 
                FROM (
                    -- some expensive query
                )   B
                ORDER BY  KEY_FIELD, VALUE_FIELD
            ) C
            , (SELECT @rownumf := 1) t_rownum
            , (SELECT @prevrowidf := '*') t_previd
        ) D
        ORDER BY  KEY_FIELD, RANKF DESC
    ) E
    , (SELECT @rownumr := 1) t_rownum
    , (SELECT @prevrowidr := '*') t_previd
) F
WHERE RANKF-RANKR BETWEEN -1 and 1
GROUP BY KEY_FIELD
Tesstessa answered 8/12, 2016 at 11:17 Comment(0)
M
0

The following SQL Code will help you to calculate the median in MySQL using user defined variables.

create table employees(salary int);

insert into employees values(8);
insert into employees values(23);
insert into employees values(45);
insert into employees values(123);
insert into employees values(93);
insert into employees values(2342);
insert into employees values(2238);

select * from employees;

Select salary from employees  order by salary;

set @rowid=0;
set @cnt=(select count(*) from employees);
set @middle_no=ceil(@cnt/2);
set @odd_even=null;

select AVG(salary) from 
(select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status  from employees  order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

If you are looking for detailed explanation, please refer this blog.

Mercantilism answered 5/10, 2017 at 5:53 Comment(0)
H
0

I found this answer very helpful - https://www.eversql.com/how-to-calculate-median-value-in-mysql-using-a-simple-sql-query/

SET @rowindex := -1;

SELECT
   AVG(g.grade)
FROM
   (SELECT @rowindex:=@rowindex + 1 AS rowindex,
       grades.grade AS grade
    FROM grades
    ORDER BY grades.grade) AS g
WHERE
g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));
Headstream answered 22/1, 2019 at 11:19 Comment(0)
D
0

The below query will work perfect for both even or odd number of rows. In the subquery, we are finding the value(s) which has same number of rows before and after it. In case of odd rows the having clause will evaluate to 0 (same number of rows before and after cancels out the sign).

Similarly, for even rows the having clause evaluates to 1 for two rows (the center 2 rows) because they will (collectively) have same number of rows before and after.

In the outer query, we will avg out either the single value (in case of odd rows) or (2 values in case of even rows).

select avg(val) as median
from
(
    select d1.val
    from data d1 cross join data d2
    group by d1.val
    having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

Note: In case your table has duplicate values, the above having clause should be changed to the below condition. In this case, there could be values outside of the original possibilities of 0,1. The below condition will make this condition dynamic and work in case of duplicates too.

having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))
Dunnage answered 7/11, 2020 at 18:14 Comment(0)
S
0

Try something like :

SELECT  
CAST (AVG(val) AS DECIMAL(10,4))
FROM
(
    SELECT 
    val,
    ROW_NUMBER() OVER( ORDER BY val ) -1 AS rn,
    COUNT(1) OVER () -1 AS cnt
    FROM STATION
) as tmp
WHERE rn IN (FLOOR(cnt/2),CEILING (cnt/2))

**

Note : The reason for -1 is to make it zero indexed..i.e row number now starts from 0 instead of 1

**

Supplication answered 18/1, 2021 at 10:8 Comment(0)
D
0

I have not compared the performance of this solution to the rest of the answers posted here, but I found this to be the most straight-forward to understand, and covers the full extent of the mathematical formula for calculating a median. In other words, this solution will be robust enough for even- and odd-numbered data sets:

SELECT CASE 
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
 (SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM station) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>
Decompress answered 31/5, 2021 at 2:30 Comment(0)
B
0

If this is MySQL, there are window functions now and you can just do it this way (assuming you want to round up to nearest integer - otherwise just replace ROUND with CEIL or FLOOR or what have you). The following solution works for tables regardless of whether they have an even number of rows or an odd number of rows:


WITH CTE AS (
    SELECT val,
            ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
            COUNT(*) OVER () AS total_count
    FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
    rn BETWEEN
    total_count / 2.0 AND
    total_count / 2.0 + 1;

I think some of the more recent answers on this thread were already getting at this approach, but it also seemed like people were overthinking it, so consider this an improved version. Regardless of SQL flavor, there is no reason anyone should be writing a huge paragraph of code with multiple subqueries just to get the median in 2021. However, please note that the above query only works if you're asked to find the median for a continuous series. Of course, regardless of row number, sometimes people do make a distinction between what is referred to as the Discrete Median and what is referred to as the Interpolated Median for a continuous series.

If you're asked to find the median for a discrete series and the table has an even number of rows, the above solution will not work for you, and you should revert to using one of the other solutions, like TheJacobTaylor's.

The second solution below is a slightly modified version of TheJacobTaylor's, where I explicitly state CROSS JOIN. This will work for tables that have an odd number of rows too, regardless of whether you're asked to find the median for a continuous or discrete series, but I would specifically use this when asked to find the median of a discrete series. Otherwise, use the first solution. That way, you'll never have to think about whether the data contains an 'even' or 'odd' number of data points.


SELECT x.val AS median
FROM data x
CROSS JOIN data y
GROUP BY x.val
HAVING SUM(SIGN(1 - SIGN(y.val - x.val))) = (COUNT(*) + 1) / 2;

Finally, you can easily do this in PostgreSQL using built-in functions. Here is a nice explanation, along with an effective summary on discrete vs interpolated medians.

https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median

Breakaway answered 20/6, 2021 at 19:38 Comment(0)
W
0

For a table station and column lat_n, here is MySQL code to get the median:

set @rows := (select count(1) from station);
set @v1 := 0;
set @sql1 := concat('select lat_n into @v1 from station order by lat_n asc limit 1 offset ', ceil(@rows/2) - 1);
prepare statement1 from @sql1;
execute statement1;
set @v2 := 0;
set @sql2 := concat('select lat_n into @v2 from station order by lat_n asc limit 1 offset ', ceil((@rows + 1)/2) - 1);
prepare statement2 from @sql2;
execute statement2;
select (@v1 + @v2)/2;
Wifely answered 23/9, 2021 at 12:46 Comment(0)
C
0

Let's create a sample table named numbers

This Answer is specific to the MySQL database

In PostgresSql its simple use per_cont function

CREATE TABLE numbers(
  num INT,
  frequency INT
);

Insert values in numbers table

INSERT INTO numbers VALUES  
        (0,7),
        (1,1),
        (2,3),
        (3,1),
        (9,1),
        (1,1),
        (2,3),
        (3,1),
        (9,1);

-- select * from numbers 

WITH RECURSIVE num_frequency (num,frequency, i) AS 
(
SELECT  num,frequency,1
FROM   numbers
UNION ALL
SELECT num,frequency,i+1
FROM   num_frequency
WHERE  num_frequency.i < num_frequency.frequency
)

select * 
(max(case when numbers=lower_limit then num else null end)/2
+max(case when numbers=upper_limit then num else null end)/2) as median
from (
select *,total_number%2,
case 
when  total_number%2=0 then total_number/2
else  (total_number+1)/2 end as lower_limit,
case 
when  total_number%2=0 then total_number/2+1
else  (total_number+1)/2
end as upper_limit

from (
select *,max(numbers) over() as total_number from (
select num,row_number() over (order by num) 
as numbers from num_frequency
)b 
)b
)b
Crwth answered 6/2, 2023 at 19:9 Comment(1)
Welcome to Stackoverflow. This question is asked more than 13 years ago and it has an accepted answer. Please add some details about the reason you are adding a new answer.Marchesa
J
-1
set @r = 0;

select  
    case when mod(c,2)=0 then round(sum(lat_N),4)
    else round(sum(lat_N)/2,4) 
    end as Med  
from 
    (select lat_N, @r := @r+1, @r as id from station order by lat_N) A
    cross join
    (select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)
Jacobian answered 25/6, 2016 at 10:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.