How to extract a substring pattern in Postgresql
Asked Answered
R

1

22

I have a column with a lot of inconsistent strings. Some of them contain a substring with a consistent pattern of '2015mmdd_AB_CD_EFG_(text)_(text)_HIJ' which I would like to extract. I feel this is a cross over case of regexp and a substring command.

My best approach so far has been a rather ugly

substring(col_name, '........_.._.._..._.+_.+_...')

which does not end the output as desired, rahter the output is like '(...)_HIJ_blablabla'.

How can I effectively combine pattern recognition and substring selection in this case?

Rosebay answered 13/7, 2015 at 10:40 Comment(0)
H
30

Assuming that 2015mmdd actually means some kind of "date", so that the real data contains e.g. 20150713 the following will do:

substring(col_name, '[0-9]{8}_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')

This returns substrings that start with 8 numbers followed by an underscore, followed by two uppercase characters followed by an underscore followed by two uppercase characters, followed by an underscore followed by three uppercase characters, followed by an underscore followed by an opening parentheses followed by at lease one lowercase letter, followed by a closing parentheses, followed by an underscore, followed by an opening parentheses, followed by at least one lowercase character followed by a closing parentheses.

If 2015mmdd indeed means 2015 followed by the string mmdd then you need this:

substring(col_name, '[0-9]{4}mmdd_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')
Hypocotyl answered 13/7, 2015 at 12:15 Comment(2)
This is exactly what I was looking for and the first line is actually the case I'm working with. Yet it does not return any output when I query the data. Thank you very much for your help though, I'll work on a fix on my end.Rosebay
Thank you very much a_horse_with_no_name, your code does not quite do it for me as simply "[a-z]+" works fine. Working with PgAdmin on my machine. But it solved my issue anyway. :)Rosebay

© 2022 - 2024 — McMap. All rights reserved.