how to extract the domain from a URL
Asked Answered
S

3

5

I need to extract domain (four.five) from URL (one.two.three.four.five) in a Lua string variable.

I can't seem to find a function to do this in Lua.

EDIT:

By the time the URL gets to me, the http stuff has already been stripped off. So, some examples are:

a) safebrowsing.google.com 
b) i2.cdn.turner.com 
c) powerdns.13854.n7.nabble.com 

so my result should be:

a) google.com
b) turner.com
c) nabble.com
Salto answered 12/9, 2013 at 22:0 Comment(1)
this is an old post, but perhaps this is a useful hint: keep in mind that there are domains, where the last two segments are not useful, for example in Great Britain, a lot of domais end in .co.ukArmchair
P
7

This should work:

local url = "foo.bar.google.com"
local domain = url:match("[%w%.]*%.(%w+%.%w+)")
print(domain)       

Output:google.com

The pattern [%w%.]*%.(%w+%.%w+) looks for the content after the second dot . from the end.

Pyonephritis answered 13/9, 2013 at 1:58 Comment(1)
Use url:match("[%w%-%.]*%.([%w%-]+%.%w+)")) to allow hyphens in the URL.Kosher
D
5
local url = "http://foo.bar.com/?query"
print(url:match('^%w+://([^/]+)')) -- foo.bar.com

This pattern '^%w+://([^/]+)' means: ^ from the beginning of the line, take %w+ one or more alphanumeric characters (this is the protocol), then ://, then [^/]+ 1 or more characters other than slash and return (capture) these characters as the result.

Decent answered 13/9, 2013 at 0:15 Comment(3)
I need to start from the end moving from right to left since I don't know how long the url will be ... could be one.two.three or one.two.three.four or one.two.three.four.five In other languages I have done it by counting the periods from right to left and extracting the string starting with the second period from the right. I don't know how to do that in lua.Salto
provide an example of the URL (ideally several ones) you are trying to parse.Decent
By the time the url gets to me, the http stuff has already been stripped off. So, some examples are: a) safebrowsing.google.com b) i2.cdn.turner.com c) powerdns.13854.n7.nabble.com ... so my result should be: a) google.com, b) turner.com, c) nabble.comSalto
B
0

Use Paul's answer to extract domain like 1.2.3.4.4.5

local url = "http://foo.bar.com/?query" local domain = url:match('^%w+://([^/]+)'))

and next use of of "split" methods to build array for parts

http://lua-users.org/wiki/SplitJoin

like

local arr = split(domain, '%.') --escaped point because it is part of "patterns"

Next you can use latest two: arr[#arr], arr[#arr-1]

Boggart answered 9/4, 2016 at 14:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.