How can I calculate the median of values in SQLite?
Asked Answered
J

6

37

I'd like to calculate the median value in a numeric row. How can I do that in SQLite 4?

Jiles answered 2/4, 2013 at 12:19 Comment(0)
C
46

Let's say that the median is the element in the middle of an ordered list.

SQLite (4 or 3) does not have any built-in function for that, but it's possible to do this by hand:

SELECT x
FROM MyTable
ORDER BY x
LIMIT 1
OFFSET (SELECT COUNT(*)
        FROM MyTable) / 2

When there is an even number of records, it is common to define the median as the average of the two middle records. In this case, the average can be computed like this:

SELECT AVG(x)
FROM (SELECT x
      FROM MyTable
      ORDER BY x
      LIMIT 2
      OFFSET (SELECT (COUNT(*) - 1) / 2
              FROM MyTable))

Combining the odd and even cases then results in this:

SELECT AVG(x)
FROM (SELECT x
      FROM MyTable
      ORDER BY x
      LIMIT 2 - (SELECT COUNT(*) FROM MyTable) % 2    -- odd 1, even 2
      OFFSET (SELECT (COUNT(*) - 1) / 2
              FROM MyTable))
Creeps answered 2/4, 2013 at 14:1 Comment(3)
This is a nice solution, but it seems to be difficult to use it if you want to calculate the median of a "group by" result instead of the whole table. Consider "select grp, min(val), median(val), max(val) from table group by grp".Checked
@Acer - I see that you are correct. In this case I don't have an elegant, single statement solution without database support for MEDIAN. What comes to mind is this: 1) Create a table using the group by clause and SELECT INTO (call this table "G"), in sorted form, and add an AUTOINCREMENT column (call it column "i"). 2) Create a query that calculates the (max(G.i)+min(G.i))/2.0 for each group (call this column 'x'). 3) Using the Pick table, select entries from G where ABS(G.i-Pick.x)<1 . If you take the average from that last table, you'll have your answer for each group. Not pretty.Topminnow
Seems like you could do the same using a view instead of an actual tableVoluptuary
R
16

There is an extension pack of various math functions for sqlite3. It includes group functions like median.

It will be more work getting this going than CL's answer, but might be worthwhile if you think you will need any of the other functions.

http://www.sqlite.org/contrib/download/extension-functions.c?get=25

(Here is the guide for how to compile and load SQLite extensions.)

From description:

Provide mathematical and string extension functions for SQL queries using the loadable extensions mechanism. Math: acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi. String: replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter. Aggregate: stdev, variance, mode, median, lower_quartile, upper_quartile.

UPDATE 2015-04-12: Fixing "undefined symbol: sinh"

As has been mentioned in comments, this extension may not work properly despite a successful compile.

For example, compiling may work and on Linux you might copy the resulting .so file to /usr/local/lib. But .load /usr/local/lib/libsqlitefunctions from the sqlite3 shell may then generate this error:

Error: /usr/local/lib/libsqlitefunctions.so: undefined symbol: sinh

Compiling it this way seems to work:

gcc -fPIC -shared extension-functions.c -o libsqlitefunctions.so -lm

And copying the .so file to /usr/local/lib shows no similar error:

sqlite> .load /usr/local/lib/libsqlitefunctions

sqlite> select cos(pi()/4.0);
---> 0.707106781186548

I'm not sure why the order of options to gcc matters in this particular case, but apparently it does.

Credit for noticing this goes to Ludvick Lidicky's comment on this blog post

Recency answered 22/5, 2013 at 2:42 Comment(5)
Any idea how to install this? The file itself doesn't help much.Lavaliere
@Lavaliere Try asking about it in another question, here's a head start. Out of curiosity, I tried compiling the extensions tonight. Following the instructions included in the C comments at the top of the file was simple enough (you did read the file and found those, right?) but there are some bugs. It compiles with gcc on Ubuntu 14.04 LTS, with prerequisite "libsqlite3-dev", resulting in a shared lib "libsqlitefunctions.so" . The same Ubuntu's sqlite3 tries to load it when given the command SELECT load_extension('./libsqlitefunctions') but throws an error "undefined symbol: sinh".Recency
This link (github.com/yajirobee/environment/blob/master/common/…) gives a compiled so file. It works!Howlan
Anyone here got this extension running on windows, or a compiled version they can share? I'm having troubles getting it to load...Dibbrun
@Kassym Dorsel: download the above extension-functions.c file and also the sqlite-amalgamation files from sqlite.org/2019/sqlite-amalgamation-3290000.zip and extract all in same folder. Then install MinGw Installer and from here install package "mingw32-base-bin" into c:\MinGW. Now open a DOS-command and change to c:\MinGW\bin and run the command gcc -g -shared "C:\YourPath\extension-functions.c" -o "C:\YourPath\extension-functions.dll" Finally load the dll in SQLite via .load C:/YourPath/extension-functions.dllZoan
U
1

There is a log table with timestamp, label, and latency. We want to see the latency median value of each label, grouped by timestamp. Format all latency value to 15 char length with leading zeroes, concatenate it, and cut half positioned value(s).. there is the median.

select L, --V, 
       case when C % 2 = 0 then
       ( substr( V, ( C - 1 ) * 15 + 1, 15) * 1 + substr( V, C * 15 + 1, 15) * 1 ) / 2
       else
        substr( V, C * 15 + 1, 15) * 1
       end as MEDST
from (
    select L, group_concat(ST, "") as V, count(ST) / 2 as C
    from (
        select label as L, 
               substr( timeStamp, 1, 8) * 1 as T, 
               printf( '%015d',latency) as ST
        from log
        where label not like '%-%' and responseMessage = 'OK'
        order by L, T, ST ) as XX
    group by L
    ) as YY
Unmanly answered 29/11, 2018 at 8:45 Comment(0)
Z
0

Dixtroy provided the best solution via group_concat(). Here is a full sample for this:

DROP TABLE [t];
CREATE TABLE [t] (name, value INT);
INSERT INTO t VALUES ('A', 2);
INSERT INTO t VALUES ('A', 3);
INSERT INTO t VALUES ('B', 4);
INSERT INTO t VALUES ('B', 5);
INSERT INTO t VALUES ('B', 6);
INSERT INTO t VALUES ('C', 7);

results into this table:

name|value
A|2
A|3
B|4
B|5
B|6
C|7

now we use the (slightly modified) query from Dextroy:

SELECT name, --string_list, count, middle,
    CASE WHEN count%2=0 THEN
        0.5 * substr(string_list, middle-10, 10) + 0.5 * substr(string_list, middle, 10)
    ELSE
        1.0 * substr(string_list, middle, 10)
    END AS median
FROM (
    SELECT name, 
        group_concat(value_string,"") AS string_list,
        count() AS count, 
        1 + 10*(count()/2) AS middle
    FROM (
        SELECT name, 
            printf( '%010d',value) AS value_string
        FROM [t]
        ORDER BY name,value_string
    )
    GROUP BY name
);

...and get this result:

name|median
A|2.5
B|5.0
C|7.0
Zoan answered 4/9, 2019 at 9:42 Comment(0)
M
0

If you are using PDO then ::loadExtension() used in Paul's answer might not be available to you.

Assuming you are using PHP, an alternative is to create an aggregate function.

$pdo_handle->sqliteCreateAggregate(
    'median', // the name of the function to declare
    function($context, $row_number, $value){ // a method called for each row
        $context[] = $value; // store the values
        return $context; 
    },
    function($context, $row_count){ // a method called once all row have been iterated over
        // sort the values
        sort($context, SORT_NUMERIC);
        // cound the number of values
        $count = count($context);
        // get the mid point of array (lowest one)
        $middle = floor($count/2);
        // if there is an even amount of values
        if (($count % 2) == 0) {
            // average the two middle values to find the median
            return ($context[$middle--] + $context[$middle])/2;        
        } else {
            // odd amount of elements, so the median value is simply the one in the middle
            return $context[$middle];
        }    
    },
    1
);

You are then free to do a

SELECT median("column_name") FROM "table_name";

Similar "create function" might be available in other languages.

Meaty answered 7/9, 2022 at 12:59 Comment(1)
Or via the C interfaceDemulcent
L
-1

The SELECT AVG(x) returns just the year of date values formatted as YYYY-MM-DD, so I tweaked CL's solution just slightly to accommodate dates:

SELECT DATE(JULIANDAY(MIN(MyDate)) + (JULIANDAY(MAX(MyDate)) - JULIANDAY(MIN(MyDate)))/2) as Median_Date
FROM (
   SELECT MyDate
      FROM MyTable
      ORDER BY MyDate
      LIMIT 2 - ((SELECT COUNT(*) FROM MyTable) % 2) -- odd 1, even 2
      OFFSET (SELECT (COUNT(*) - 1) / 2 FROM MyTable)
);
Laplante answered 17/3, 2015 at 5:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.