StringUtil indexOf() equivalent postgreSQL query
Asked Answered
S

3

5

I need to implement stringUtils Class indexOf() method in postgresql.

Lets say I have a table in which url is one of the column.

url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"

My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query

Any idea on implementing this would be grateful. Thanks

Siena answered 22/8, 2014 at 8:16 Comment(0)
H
5

Have you tried split_part method?

SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)

Result:

split_part
paypal-info.com

For other string functions try this doc: http://www.postgresql.org/docs/9.1/static/functions-string.html

Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:

SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;
Heartland answered 22/8, 2014 at 8:18 Comment(4)
That looks great Michal. but how to get the indexOf 3rd '/' alone without splitting. My aim is to take the index? how can we do that?Siena
@sri Sorry, I focused on this part of the question: "and do substring and take only paypal-info.com host name in Postgresql Query". And so i did take the the host name out. As for finding the third occurence of the '/' I don't think there is a simple solution in postgres. You would have to nest string functions.Barclay
@sri - I've edited the answer with some workaround - I don't know if it's ok for you. I used the position of the forth substring taken from the aforementioned split_part().Barclay
@sri - you can wrap this workaround from my edit in your own util function, and you'll have the result you need.Barclay
G
4

The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.

select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');

however it doesn't offer the option to get the n'th occurrence.

You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.

PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:

create language plpythonu;
create or replace function urlhost(url text) returns text 
language plpythonu 
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;

then:

regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
     urlhost     
-----------------
 paypal-info.com
(1 row)

If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.

For best performance, you could write a simple C function and expose that as an extension.

Guano answered 22/8, 2014 at 13:57 Comment(4)
I'm not sure your definition of "wrong" is correct.Instructions
@Instructions Well, it's not as bad as parsing HTML with regex, but why roll your own when you can use well tested, pre-written code?Guano
I rather meant that it depends on specific requirements.Instructions
@Instructions PL/Perl or PL/Python are part of the core PostgreSQL distribution. Python's urlparse is part of Python's core. Pretty safe bet IMO, and you can always fall back to one of the split_part solutions already offered by others if you really needed to for some reason.Guano
I
1

Just replace 3 with N to get the index of the Nth '/' in a given string

SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1

To extract the host name from url you can use

SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')

Tested here: SQLFiddle

Instructions answered 22/8, 2014 at 10:30 Comment(6)
Sorry Vlad both queries doesn't work. I dont know why. I dont get any outputsSiena
What version is your postgresql?Instructions
I havent tested it on a database, will try later when I can.Instructions
@sri Indeed the reg expressions had mistakes... Corrected them now. Also added a fiddle.Instructions
thanks Vlad. the First query do not need -1 otherwise it works.Siena
Index is normally zero-based, that's why -1. If you want it to be 1-based, then you don't need to do -1 indeed. Btw, is your question answered now? :)Instructions

© 2022 - 2024 — McMap. All rights reserved.