It looks to me like InternetDomainName.topPrivateDomain() does exactly what you want. Guava maintains a list of public suffixes (based on Mozilla's list at publicsuffix.org) that it uses to determine what the public suffix part of the host is... the top private domain is the public suffix plus its first child.
Here's a quick example:
public class Test {
public static void main(String[] args) throws URISyntaxException {
ImmutableList<String> urls = ImmutableList.of(
"http://example.google.com", "http://google.com",
"http://bing.bing.bing.com", "http://www.amazon.co.jp/");
for (String url : urls) {
System.out.println(url + " -> " + getTopPrivateDomain(url));
}
}
private static String getTopPrivateDomain(String url) throws URISyntaxException {
String host = new URI(url).getHost();
InternetDomainName domainName = InternetDomainName.from(host);
return domainName.topPrivateDomain().name();
}
}
Running this code prints:
http://example.google.com -> google.com
http://google.com -> google.com
http://bing.bing.bing.com -> bing.com
http://www.amazon.co.jp/ -> amazon.co.jp
.com
part) and SLD (thegoogle
orbing
part) from URLs? – RigadoonString.split('\\.')
to get the parts and return the last two? Or do aString.substring(indexOfPenultimatePeriod)
after (easily) working out the appropriate index? What is the complexity here? – Gratin