Get execution time of PostgreSQL query
Asked Answered
L

3

113
DECLARE @StartTime datetime,@EndTime datetime

SELECT @StartTime=GETDATE()

select distinct born_on.name
from   born_on,died_on
where (FLOOR(('2012-01-30'-born_on.DOB)/365.25) <= (
    select max(FLOOR((died_on.DOD - born_on.DOB)/365.25))
    from   died_on, born_on
    where (died_on.name=born_on.name))
    )
and   (born_on.name <> All(select name from died_on))

SELECT @EndTime=GETDATE()

SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in millisecs]

I am unable to get the query time. Instead I get the following error:

sql:/home/an/Desktop/dbms/query.sql:9: ERROR:  syntax error at or near "@"
LINE 1: DECLARE @StartTime datetime,@EndTime datetime
Lavena answered 30/1, 2012 at 11:48 Comment(9)
I tested this query on my SQL studio, just replacing your select distinct... with something in my DB and it worked fine. What are you using to execute this query?Sedan
m running the above query in postgres !! :-/Lavena
What exactly are you trying to do? This doesn't seem at all related with Java or JDBC (and from your comment, not even MySql). Is this a stored procedure?Argyres
yes, this is a stored procedure .. data has already been loaded in my DB using JDBC .. now m running the query and wish to know the execution time !!Lavena
Why is this flagged both MySQL and PostgreSQL?Prescribe
@PaulBain what did u replace "select distinct" with ?Lavena
it has been flagged so because I am using postgresql.Lavena
MySQL and PostgreSQL are very different things (As if comparing Teradata to Oracle to Access). If you are using PostgreSQL then you should remove the MySLQ flag :)Prescribe
@zyxwvu: so why did you include the MySQL tag in a question that is for PostgreSQL but uses Microsoft SQL syntax? The error message you get is a MySQL Error, not a PostgreSQL error.Display
T
162

There are various ways to measure execution time, each has pros and cons. But whatever you do, some degree of the observer effect applies. I.e., measuring itself may distort the result.

1. EXPLAIN ANALYZE

You can prepend EXPLAIN ANALYZE, which reports the whole query plan with estimated costs actually measured times. The query is actually executed (with all side -effect, if any!). Works for all DDL commands and some others. See:

To check whether my adapted version of your query is, in fact, faster:

EXPLAIN ANALYZE
SELECT DISTINCT born_on.name
FROM   born_on b
WHERE  date '2012-01-30' - b.dob <= (
    SELECT max(d1.dod - b1.dob)
    FROM   born_on b1
    JOIN   died_on d1 USING (name)  -- name must be unique!
    )
AND NOT EXISTS (
    SELECT FROM died_on d2
    WHERE  d2.name = b.name
    );

Execute a couple of times to get more comparable times with warm cache. Several options are available to adjust the level of detail.

While mainly interested in total execution time, make it:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)

Mostly, TIMING matters - the manual:

TIMING

Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. [...]

EXPLAIN ANALYZE measures on the server, using server time from the server OS, excluding network latency. But EXPLAIN adds some overhead to also output the query plan.

2. psql with \timing

Or use \timing in psql. Like Peter demonstrates.

The manual:

\timing [ on | off ]

With a parameter, turns displaying of how long each SQL statement takes on or off. Without a parameter, toggles the display between on and off. The display is in milliseconds; intervals longer than 1 second are also shown in minutes:seconds format, with hours and days fields added if needed.

Important difference: psql measures on the client using local time from the local OS, so the time includes network latency. This can be a negligible difference or huge depending on connection and volume of returned data.

3. Enable log_duration

This has probably the least overhead per measurement and produces the least distorted timings. But it's a little heavy-handed as you have to be superuser, have to adjust the server configuration, cannot just target the execution of a single query, and you have to read the server logs (unless you redirect to stdout).

The manual:

log_duration (boolean)

Causes the duration of every completed statement to be logged. The default is off. Only superusers can change this setting.

For clients using extended query protocol, durations of the Parse, Bind, and Execute steps are logged independently.

There are related settings like log_min_duration_statement.

4. Precise manual measurement with clock_timestamp()

The manual:

clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command.

filiprem provided a great way to get execution times for ad-hoc queries as exact as possible. On modern hardware, timing overhead should be insignificant but depending on the host OS it can vary wildly. Find out with the server application pg_test_timing.
Else you can mostly filter the overhead like this:

DO
$do$
DECLARE
   _timing1  timestamptz;
   _start_ts timestamptz;
   _end_ts   timestamptz;
   _overhead numeric;     -- in ms
   _timing   numeric;     -- in ms
BEGIN
   _timing1  := clock_timestamp();
   _start_ts := clock_timestamp();
   _end_ts   := clock_timestamp();
   -- take minimum duration as conservative estimate
   _overhead := 1000 * extract(epoch FROM LEAST(_start_ts - _timing1
                                              , _end_ts   - _start_ts));

   _start_ts := clock_timestamp();
   PERFORM 1;  -- your query here, replacing the outer SELECT with PERFORM
   _end_ts   := clock_timestamp();
   
-- RAISE NOTICE 'Timing overhead in ms = %', _overhead;
   RAISE NOTICE 'Execution time in ms = %' , 1000 * (extract(epoch FROM _end_ts - _start_ts)) - _overhead;
END
$do$;

Take the time repeatedly (doing the bare minimum with 3 timestamps here) and pick the minimum interval as conservative estimate for timing overhead. Also, executing the function clock_timestamp() a couple of times should warm it up (in case that matters for your OS).

After measuring the execution time of the payload query, subtract that estimated overhead to get closer to the actual time.

Of course, it's more meaningful for cheap queries to loop 100000 times or execute it on a table with 100000 rows if you can, to make distracting noise insignificant.

Tesstessa answered 30/1, 2012 at 15:6 Comment(3)
Nice (+1) - v. thorough as usual. I think that what you call the "measurement problem might be better termed the observer effect?Boony
@Vérace: Yes, thank you, that's clearly the better term.Tesstessa
@AndreSilva: Yes, REFRESH MATERIALIZED VIEW CONCURRENTLY can be used here. "Concurrently" does not imply "asynchronous". Different beast. It just takes longer under concurrent write load, which isn't blocked by the refresh.Tesstessa
A
202

If you mean in psql, rather than some program you are writing, use \? for the help, and see:

\timing [on|off]       toggle timing of commands (currently off)

And then you get output like:

# \timing on
Timing is on.

# select 1234;        
 ?column? 
----------
     1234
(1 row)

Time: 0.203 ms
Aleman answered 8/12, 2015 at 12:56 Comment(5)
Thanks for giving the simple, underlying answer to the title of this question.Rusell
I this is the most direct and easiest way to show timing in postgres. Thanks!Breakaway
This shows the server side time or the client side (latency)? The docs say nothing about this.Psycho
@CrouchingKitten I tested with a really slow internet, and got "0.261 ms" using the local unix domain socket connection, and "944.272 ms" :D with the slow internet with psql over TCP. So therefore it can only be client side.Aleman
@CrouchingKitten: Client side - as opposed to total execution time reported by EXPLAIN ANALYZE, which measures on the server (without network latency).Tesstessa
T
162

There are various ways to measure execution time, each has pros and cons. But whatever you do, some degree of the observer effect applies. I.e., measuring itself may distort the result.

1. EXPLAIN ANALYZE

You can prepend EXPLAIN ANALYZE, which reports the whole query plan with estimated costs actually measured times. The query is actually executed (with all side -effect, if any!). Works for all DDL commands and some others. See:

To check whether my adapted version of your query is, in fact, faster:

EXPLAIN ANALYZE
SELECT DISTINCT born_on.name
FROM   born_on b
WHERE  date '2012-01-30' - b.dob <= (
    SELECT max(d1.dod - b1.dob)
    FROM   born_on b1
    JOIN   died_on d1 USING (name)  -- name must be unique!
    )
AND NOT EXISTS (
    SELECT FROM died_on d2
    WHERE  d2.name = b.name
    );

Execute a couple of times to get more comparable times with warm cache. Several options are available to adjust the level of detail.

While mainly interested in total execution time, make it:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)

Mostly, TIMING matters - the manual:

TIMING

Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. [...]

EXPLAIN ANALYZE measures on the server, using server time from the server OS, excluding network latency. But EXPLAIN adds some overhead to also output the query plan.

2. psql with \timing

Or use \timing in psql. Like Peter demonstrates.

The manual:

\timing [ on | off ]

With a parameter, turns displaying of how long each SQL statement takes on or off. Without a parameter, toggles the display between on and off. The display is in milliseconds; intervals longer than 1 second are also shown in minutes:seconds format, with hours and days fields added if needed.

Important difference: psql measures on the client using local time from the local OS, so the time includes network latency. This can be a negligible difference or huge depending on connection and volume of returned data.

3. Enable log_duration

This has probably the least overhead per measurement and produces the least distorted timings. But it's a little heavy-handed as you have to be superuser, have to adjust the server configuration, cannot just target the execution of a single query, and you have to read the server logs (unless you redirect to stdout).

The manual:

log_duration (boolean)

Causes the duration of every completed statement to be logged. The default is off. Only superusers can change this setting.

For clients using extended query protocol, durations of the Parse, Bind, and Execute steps are logged independently.

There are related settings like log_min_duration_statement.

4. Precise manual measurement with clock_timestamp()

The manual:

clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command.

filiprem provided a great way to get execution times for ad-hoc queries as exact as possible. On modern hardware, timing overhead should be insignificant but depending on the host OS it can vary wildly. Find out with the server application pg_test_timing.
Else you can mostly filter the overhead like this:

DO
$do$
DECLARE
   _timing1  timestamptz;
   _start_ts timestamptz;
   _end_ts   timestamptz;
   _overhead numeric;     -- in ms
   _timing   numeric;     -- in ms
BEGIN
   _timing1  := clock_timestamp();
   _start_ts := clock_timestamp();
   _end_ts   := clock_timestamp();
   -- take minimum duration as conservative estimate
   _overhead := 1000 * extract(epoch FROM LEAST(_start_ts - _timing1
                                              , _end_ts   - _start_ts));

   _start_ts := clock_timestamp();
   PERFORM 1;  -- your query here, replacing the outer SELECT with PERFORM
   _end_ts   := clock_timestamp();
   
-- RAISE NOTICE 'Timing overhead in ms = %', _overhead;
   RAISE NOTICE 'Execution time in ms = %' , 1000 * (extract(epoch FROM _end_ts - _start_ts)) - _overhead;
END
$do$;

Take the time repeatedly (doing the bare minimum with 3 timestamps here) and pick the minimum interval as conservative estimate for timing overhead. Also, executing the function clock_timestamp() a couple of times should warm it up (in case that matters for your OS).

After measuring the execution time of the payload query, subtract that estimated overhead to get closer to the actual time.

Of course, it's more meaningful for cheap queries to loop 100000 times or execute it on a table with 100000 rows if you can, to make distracting noise insignificant.

Tesstessa answered 30/1, 2012 at 15:6 Comment(3)
Nice (+1) - v. thorough as usual. I think that what you call the "measurement problem might be better termed the observer effect?Boony
@Vérace: Yes, thank you, that's clearly the better term.Tesstessa
@AndreSilva: Yes, REFRESH MATERIALIZED VIEW CONCURRENTLY can be used here. "Concurrently" does not imply "asynchronous". Different beast. It just takes longer under concurrent write load, which isn't blocked by the refresh.Tesstessa
A
54

PostgreSQL is not Transact-SQL. These are two slightly different things.

In PostgreSQL, this would be something along the lines of

DO $proc$
DECLARE
  StartTime timestamptz;
  EndTime timestamptz;
  Delta double precision;
BEGIN
  StartTime := clock_timestamp();
  PERFORM foo FROM bar; /* Put your query here, replacing SELECT with PERFORM */
  EndTime := clock_timestamp();
  Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) );
  RAISE NOTICE 'Duration in millisecs=%', Delta;
END;
$proc$;

On the other hand, measuring query time does not have to be this complicated. There are better ways:

  1. In postgres command line client there is a \timing feature which measures query time on client side (similar to duration in bottomright corner of SQL Server Management Studio).

  2. It's possible to record query duration in server log (for every query, or only when it lasted longer than X milliseconds).

  3. It's possible to collect server-side timing for any single statement using the EXPLAIN command:

    EXPLAIN (ANALYZE, BUFFERS) YOUR QUERY HERE;
    
Ario answered 30/1, 2012 at 12:46 Comment(5)
for mentioning \timingBethlehem
@filiperm Best define Delta as double precisionKimber
Very good answer, and it works.Fattish
To complete the picture, the aggregation unit is millisecond here. Correct?Groundspeed
@Groundspeed yes! Millisecond as mentioned in the snippet code.Ario

© 2022 - 2024 — McMap. All rights reserved.