Redshift SQL - Extract numbers from string
Asked Answered
C

4

6

In Amazon Redshift tables, I have a string column from which I need to extract numbers only out. For this currently I use

translate(stringfield, '0123456789'||stringfield, '0123456789')

I was trying out REPLACE function, but its not gonna be elegant.

Any thoughts with converting the string into ASCII first and then doing some operation to extract only number? Or any other alternatives. It is hard here as Redshift do not support functions and is missing lot of traditional functions.

Edit: Trying out the below, but it only returns 051-a92 where as I need 05192 as output. I am thinking of substring etc, but I only have regexp_substr available right now. How do I get rid of any characters in between

select REGEXP_SUBSTR('somestring-051-a92', '[0-9]+..[0-9]+', 1)
Chiropractor answered 22/3, 2014 at 5:27 Comment(3)
Better to parse it before putting into Redshift - more functionality, better performance, lower cost...Trogon
@Trogon "more functionality, better performance, lower cost..." Huh? Have you ever looked into the cost of most analytic databases (Teradata, Vertica, et al)? Redshift is cheap and the SSD nodes are fast.Johan
@JoeHarris, I agree that Redshift is very low cost, but it doesn't mean that you should fill it up with long textual columns. If you can parse them into more analytical form (numbers, categories, dates...), it is more scalable and cost effective.Trogon
P
2

Better method is to remove all the non-numeric values:

select REGEXP_replace('somestring-051-a92', '[^0-9]', '')
Perkin answered 26/10, 2022 at 20:42 Comment(3)
If you wanted the opposite of this, i.e. remove all numbers from your string, you can use the following: REGEXP_REPLACE(your_column_or_string, '[0-9/-]', '')Abby
how would you do that if you want a float back? Like 98.76Wylen
select REGEXP_replace('somestring-98.76', '[^0-9.]', '')Perkin
S
1

might be late but I was solving the same problem and finally came up with this

select REGEXP_replace('somestring-051-a92', '[a-z/-]', '')

alternatively, you can create a Python UDF now

Symptomatic answered 23/2, 2017 at 14:56 Comment(2)
REGEXP_replace(value, '[A-Z/-]|[a-z/-]')Krongold
wouldn't work with special charactersPerkin
J
0

Typically your inputs will conform to some sort of pattern that can be used to do the parsing using SUBSTRING() with CHARINDEX() { aka STRPOS(), POSITION() }.

E.g. find the first hyphen and the second hyphen and take the data between them.

If not (and assuming your character range is limited to ASCII) then your best bet would be to nest 26+ REPLACE() functions to remove all of the standard alpha characters (and any punctuation as well).

If you have multibyte characters in your data though then this is a non-starter.

Johan answered 24/3, 2014 at 14:28 Comment(2)
What if I have multibyte characters in there?Chiropractor
Ha, it's a non-starter! If you really have multi-byte chars then these probably should have been split upstream somewhere.Johan
C
-1

You can specify "any non digit" that includes non-printable, symbols, alpha, etc.

e.g., regexp_replace('brws--A*1','[\D]')

returns "1"

Contralto answered 25/1, 2017 at 20:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.