isnumeric() with PostgreSQL
Asked Answered
H

5

49

I need to determine whether a given string can be interpreted as a number (integer or floating point) in an SQL statement. As in the following:

SELECT AVG(CASE WHEN x ~ '^[0-9]*.?[0-9]*$' THEN x::float ELSE NULL END) FROM test

I found that Postgres' pattern matching could be used for this. And so I adapted the statement given in this place to incorporate floating point numbers. This is my code:

WITH test(x) AS (
    VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
    ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'))

SELECT x
     , x ~ '^[0-9]*.?[0-9]*$' AS isnumeric
FROM test;

The output:

    x    | isnumeric 
---------+-----------
         | t
 .       | t
 .0      | t
 0.      | t
 0       | t
 1       | t
 123     | t
 123.456 | t
 abc     | f
 1..2    | f
 1.2.3.4 | f
(11 rows)

As you can see, the first two items (the empty string '' and the sole period '.') are misclassified as being a numeric type (which they are not). I can't get any closer to this at the moment. Any help appreciated!


Update Based on this answer (and its comments), I adapted the pattern to:

WITH test(x) AS (
    VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
    ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))

SELECT x
     , x ~ '^([0-9]+[.]?[0-9]*|[.][0-9]+)$' AS isnumeric
FROM test;

Which gives:

     x    | isnumeric 
----------+-----------
          | f
 .        | f
 .0       | t
 0.       | t
 0        | t
 1        | t
 123      | t
 123.456  | t
 abc      | f
 1..2     | f
 1.2.3.4  | f
 1x234    | f
 1.234e-5 | f
(13 rows)

There are still some issues with the scientific notation and with negative numbers, as I see now.

Hypotaxis answered 24/4, 2013 at 15:25 Comment(3)
Do you have to worry about negative numbers? How about scientific notation?Susi
@muistooshort thanks again, I was especially interested in this kind of input. This pattern matching approach is not as straight-forward as I expected.Hypotaxis
The regex for negative numbers are simply: '^-?([0-9]+[.]?[0-9]*|[.][0-9]+)$' correct?Joiner
M
110

As you may noticed, regex-based method is almost impossible to do correctly. For example, your test says that 1.234e-5 is not valid number, when it really is. Also, you missed negative numbers. What if something looks like a number, but when you try to store it it will cause overflow?

Instead, I would recommend to create function that tries to actually cast to NUMERIC (or FLOAT if your task requires it) and returns TRUE or FALSE depending on whether this cast was successful or not.

This code will fully simulate function ISNUMERIC():

CREATE OR REPLACE FUNCTION isnumeric(text) RETURNS BOOLEAN AS $$
DECLARE x NUMERIC;
BEGIN
    x = $1::NUMERIC;
    RETURN TRUE;
EXCEPTION WHEN others THEN
    RETURN FALSE;
END;
$$
STRICT
LANGUAGE plpgsql IMMUTABLE;

Calling this function on your data gets following results:

WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
  ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))
SELECT x, isnumeric(x) FROM test;

    x     | isnumeric
----------+-----------
          | f
 .        | f
 .0       | t
 0.       | t
 0        | t
 1        | t
 123      | t
 123.456  | t
 abc      | f
 1..2     | f
 1.2.3.4  | f
 1x234    | f
 1.234e-5 | t
 (13 rows)

Not only it is more correct and easier to read, it will also work faster if data was actually a number.

Markhor answered 25/4, 2013 at 3:58 Comment(12)
1.234d+5 is a "valid" number, too. I ran into that format doing some data warehouse work a few years ago. It was in the output of an old Fortran program; it represents a double-precision float value. Whatever office software they had imported it correctly.Outsoar
Well, my point is that if you are trying to say if given string stored in Postgres database is a valid number, the only reasonable method is to ask Postgres server itself on what it thinks about it. If it says that 1.234d+5 is not a valid number, then you cannot really cast it into valid number using Postgres means.Markhor
I'd modify it a little to handle NULLs: FUNCTION isnumeric(anyelement) to take any argument. Then returning dynamic value for successful operations: DECLARE x NUMERIC; results BOOLEAN; And set that value inside BEGIN block: results = CASE WHEN $1 IS NULL THEN NULL ELSE TRUE END; x = $1::NUMERIC; RETURN results; --- This means that isnumeric(NULL) will return NULL, since NULLs have no value.Disentitle
There is easier solution to handle NULLs. Leave function body intact, and simply add this line to function declaration: RETURNS NULL ON NULL INPUTMarkhor
@Markhor Yeah, I was thinking there was something like that, but my knowledge of plpgsql is fading :) I'm still not sure if that's how NULL input should be treated; that is, NULL in NULL out. I know for my uses it generally doesn't matter (e.g., CASE WHEN isnumeric(<field>)...) and I'd think a person could easily coalesece(isnumeric(<field>),false) if it did; but this function has come in handyDisentitle
After re-reading documentation once again, it turns out that simply adding keyword STRICT is equivalent to RETURNS NULL ON NULL INPUT and does what you wanted.Markhor
I have edited the answer to include the STRICT keyword as discussed above. It's a database function; it should do the right thing with NULL. I tested it, and without STRICT it returns true for NULL which is never what anyone would want.Pyrex
I think an error shouldn't be used as a control. This answer causes to error given below: lock buffer_content is not held. PosstgreSQL Version: "PostgreSQL 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit"Sordello
@gokaysatir, are you sure it's not some other part of your query causing locking issues?Markhor
@Markhor yes, indeed this happens with these steps (sorry for bad English): * I have a function index, and the indexed function calls your answer (isnumeric). * When i update a row, PostgreSQL is probably trying to refresh index, and it calls the indexed function too many times, then this error happens. * Then i updated the isnumeric function with a regular expression, then the error is gone. * I copied the regular expression from another answer and tested for my purposes. It works. The regular expression is: '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$'Sordello
@gokaysatir, consider reporting this bug here. This locking problem should not happen inside PLPGSQL function even if using exceptions.Markhor
The downside of this solution is that exception handing inside the function creates a subtransaction. This puts limit on using it whithing mass insert or update queries.Germane
E
10

You problem is the two 0 or more [0-9] elements on each side of the decimal point. You need to use a logical OR | in the number identification line:

~'^([0-9]+\.?[0-9]*|\.[0-9]+)$'

This will exclude a decimal point alone as a valid number.

Ephesian answered 24/4, 2013 at 15:30 Comment(2)
You're missing some escapes on your .s, that will match both '1x1' and 'x1'.Susi
Yes, I'm used to Oracle and Java, please ensure your escapes are correct, the | should be the proper POSIX OR operator and the . should be PERIOD, not the POSIX 'all characters' operator.Ephesian
A
-3

I suppose one could have that opinion (that it's not a misuse of exception handling), but generally I think that an exception handling mechanism should be used just for that. Testing whether a string contains a number is part of normal processing, and isn't "exceptional".

But you're right about not handling exponents. Here's a second stab at the regular expression (below). The reason I had to pursue a solution that uses a regular expression was that the solution offered as the "correct" solution here will fail when the directive is given to exit when an error is encountered:

SET exit_on_error = true;

We use this often when groups of SQL scripts are run, and when we want to stop immediately if there is any issue/error. When this session directive is given, calling the "correct" version of isnumeric will cause the script to exit immediately, even though there's no "real" exception encountered.

create or replace function isnumeric(text) returns boolean
  immutable
  language plpgsql
as $$
begin
  if $1 is null or rtrim($1)='' then
    return false;
  else
    return (select $1 ~ '^ *[-+]?[0-9]*([.][0-9]+)?[0-9]*(([eE][-+]?)[0-9]+)? *$');
  end if;
end;
$$;
Albino answered 30/10, 2018 at 15:14 Comment(4)
You are still missing a point. Your regex will pass 1e, which is not a number. And, it will pass 1e100000, which may look like a number, but it cannot be stored as one, and 1e100000::NUMERIC will fail to cast.Markhor
The regex above will return false for "1e", which is correct. But you're right about really very large or very small exponentiated representations. However, numeric is defined as: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point. So in my context, and I think for many other business contexts, if we're attempting to test a number larger than this, there's a serious problem, and I want the program to crash or throw an exception. This might not be true if I were working in astronomy or physics. For my purposes I can't throw one every time it's called.Albino
And strictly speaking, a number like 1e100000 is numeric. You just can't cast it from a string to a numeric.Albino
If you can't cast it to numeric, it is not numeric.Markhor
I
-3

Since PostgreSQL 9.5 (2016) you can just ask the type of a json field:

jsonb_typeof(field)

From the PostgreSQL documentation:

json_typeof(json)
jsonb_typeof(jsonb)

Returns the type of the outermost JSON value as a text string. Possible types are object, array, string, number, boolean, and null.

Example

When aggregating numbers and wanting to ignore strings:

SELECT m.title, SUM(m.body::numeric)
FROM messages as m
WHERE jsonb_typeof(m.body) = 'number'
GROUP BY m.title;

Without WHERE the ::numeric part would crash.

Incurrent answered 26/2, 2020 at 14:43 Comment(2)
That works ok if you know the string you want to test is JSON but doesn't seem to work well with arbitrary strings. For example: compare select jsonb_typeof('"foo"') and select jsonb_typeof('foo');Asclepiadaceous
Unfortunately, this does not really work. It crashes on most inputs, and even on numbers like 1.Markhor
A
-4

The obvious problem with the accepted solution is that it is an abuse of exception handling. If there's another problem encountered, you'll never know it because you've tossed away the exceptions. Very bad form. A regular expression would be the better way to do this. The regex below seems to behave well.

create function isnumeric(text) returns boolean
    immutable
    language plpgsql
as $$
begin
    if $1 is not null then
     return (select $1 ~ '^(([-+]?[0-9]+(\.[0-9]+)?)|([-+]?\.[0-9]+))$');
    else
     return false;
    end if;
end;
$$
;
Albino answered 17/10, 2018 at 16:12 Comment(2)
It is NOT abuse of exception handling. It only wraps around single operator: x = $1::NUMERIC;, which is designed to cast parameter as numeric value. If any problem happens with this cast, it means $1 was NOT a number, and thats all we care about to get our answer. There is no way to get any other exception here. Your attempt still won't work - it will not parse 1e6, which is a valid number.Markhor
Your solution returns doesn't work for scientific notations, e.g. 1e-5 or 1.e5 or 'Nan' which the accepted solution properly detects as a valid numeric value.Crine

© 2022 - 2024 — McMap. All rights reserved.