How to get domain from a string using javascript regular expression
Asked Answered
R

2

5

As the title suggests, I'm trying to retrieve the domain from a string using javascript regular expression.

Take the following strings:

String                                  ==>     Return
"google"                                ==>     null
"google.com"                            ==>     "google.com"
"www.google.com"                        ==>     "www.google.com"
"ftp://ftp.google.com"                  ==>     "ftp.google.com"
"http://www.google.com"                 ==>     "www.google.com"
"http://www.google.com/"                ==>     "www.google.com"
"https://www.google.com/"               ==>     "www.google.com"
"https://www.google.com.sg/"            ==>     "www.google.com.sg"
"https://www.google.com.sg/search/"     ==>     "www.google.com.sg"
"*://www.google.com.sg/search/"         ==>     "www.google.com.sg"

I've already read "Regex to find domain name without www - Stack Overflow" and "Extract root domain name from string - Stack Overflow" but they were too complicated so I tried writing my own regular expression:

var re = new RegExp("[\\w]+[\\.\\w]+");
/[\w]+[\.\w]+/
re.exec(document.URL);

which works fine with "google.com", "www.google.com" and "www.google.com.sg" but returns http with "http://google.com/", "http://www.google.com/" etc.

As I am new to regular expressions, I can't seem to figure out what's wrong... any ideas?

Thanks in advance!

Rusel answered 15/8, 2014 at 8:4 Comment(0)
C
11

Use this regex:

/(?:[\w-]+\.)+[\w-]+/

Here is a regex demo!

Sampling:

>>> var regex = /(?:[\w-]+\.)+[\w-]+/
>>> regex.exec("google.com")
... ["google.com"]
>>> regex.exec("www.google.com")
... ["www.google.com"]
>>> regex.exec("ftp://ftp.google.com")
... ["ftp.google.com"]
>>> regex.exec("http://www.google.com")
... ["www.google.com"]
>>> regex.exec("http://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com.sg/")
... ["www.google.com.sg"]
Chilon answered 15/8, 2014 at 8:8 Comment(8)
Omgawd thanks! Love that regex, short n' sweet~ Although I'm still trying to figure out how it works lol... Also, >>> regex.exec("ftp://www.google.com") ... ["ftp.google.com"], how'd you get that? haha :)Rusel
just to add few bits, a domain name may also have a hyphen sign -, may you need to adjust the same.Piping
@Piping and how would I add it to the regex? I'm not really that good at regex so yeah... lolRusel
@Chilon I still can't figure it out how u did it, care to explain how it works?Rusel
@Rusel What we would like to match is a aaa.bbb(.ccc.ddd.eee...) sequence. I did this by quoting the characters as [\w-]+ (any Word Character or hyphens), having another group as the characters with a dot (?:[\w-]+\.), and quantify it to allow matching more than one time. +.Chilon
@Chilon Cool I understand now, thanks! Btw, is this, /[\w-]+(?:\.[\w-]+)+/, the same as your regex?Rusel
Yes. Try out the demo and replace the regex in the grid with yours - it's the same. Messing around regexes are also good ways of learning them. :)Chilon
@stumped221 what do you mean "only hit on"? do you want \.(?:com|net) somewhere in the sequence or at the end of the sequence?Chilon
C
2

You can use this regex in Javascript:

\b(?:(?:https?|ftp):\/\/)?([^\/\n]+)\/?

RegEx Demo

Covert answered 15/8, 2014 at 8:20 Comment(2)
As a note for users, the regex captures the target String. See this.Chilon
This regex doesn't really work because hardcoding http, https, ftp, etc. makes it very tedious and complicated to add new schemes. e.g. "file://www.example.com/" == \b(?:(?:https?|ftp|file):\/\/)?([^\/\n]+)\/? and so on..Rusel

© 2022 - 2024 — McMap. All rights reserved.