java.net.URI get host with underscores
Asked Answered
H

4

18

I got a strange behavior of that method:

import java.net.URI

    URI url = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    System.out.println(url.getHost()); /returns NULL
    URI url2 = new URI("https://s3.amazonaws.com");
    System.out.println(url2.getHost());  //returns s3.amazonaws.com

`

i want first url.getHost() to be pmi_artifacts_prod.s3.amazonaws.com, but it gives me NULL. Turned out that problem is with underscores in domain name, its a known bug, but still what can be done as I need to work with this host exactly?

Helicopter answered 17/2, 2015 at 18:11 Comment(1)
There is a great article about this here blogs.wandisco.com/java-and-underscores-in-host-names in short, yes you can do it (sort of)... but you really shouldn't.Northwest
D
9

The bug is not in Java but in naming the host, since an underscore is not a valid character in a hostname. Although widely used incorrectly, Java refuses to handle such hostnames.

https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames

A possible workaround:

public static void main(String...a) throws URISyntaxException, NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
    URI url = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    System.out.println(url.getHost()); //NULL


    URI uriObj = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    if (uriObj.getHost() == null) {
        final Field hostField = URI.class.getDeclaredField("host");
        hostField.setAccessible(true);
        hostField.set(uriObj, "pmi_artifacts_prod.s3.amazonaws.com");
    }
    System.out.println(uriObj.getHost()); //pmi_artifacts_prod.s3.amazonaws.com


    URI url2 = new URI("https://s3.amazonaws.com");
    System.out.println(url2.getHost());  //s3.amazonaws.com
}
Deepfreeze answered 17/2, 2015 at 18:23 Comment(3)
"Be conservative in what you send out, be liberal in what you accept." If people are putting underscores in their hostnames, a library used worldwide should handle them, not fail on them. Simply not robust, and shockingly bad move for a language as prominent as Java.Nance
I'm not terribly clear on how this is valid. For sure, I can verify that Java does indeed do this, but I don't understand how Java8 released in 2014 could have this restriction. Supposedly, RFC2181 reversed this restriction and was proposed in 1997 (though I don't know when it was approved, to be fair). If 2181 reversed it, then why does Java still not allow it?Grog
The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full nameGrog
T
1

Underscore support could be added right into URI by patching:

public static void main(String[] args) throws Exception {
    patchUriField(35184372088832L, "L_DASH");
    patchUriField(2147483648L, "H_DASH");
    
    URI s = URI.create("http://my_favorite_host:3892");
    // prints "my_favorite_host"
    System.out.println(s.getHost());
}

private static void patchUriField(Long maskValue, String fieldName)
        throws NoSuchMethodException, IllegalAccessException, InvocationTargetException, NoSuchFieldException {
        Field field = URI.class.getDeclaredField(fieldName);
        
        Field modifiers = Field.class.getDeclaredField("modifiers");
        modifiers.setAccessible(true);
        modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);
        
        field.setAccessible(true);
        field.setLong(null, maskValue);
}
Tensor answered 23/5, 2017 at 8:26 Comment(2)
No longer works in java 10 because lowMask and highMask private methods are removed from URI class.Neall
@ŁukaszFrankowski Thanks for pointing out! I adopted it to JDK10Tensor
F
0

note that although

new URI("https://pmi_artifacts_prod.s3.amazonaws.com");

will not throw and the workaround provided by @Vurtatoo will work for this case, it cannot handle url such as https://a_b?c={1}

I also found out that

new URI("https://a_b?c={1}")

will throw but

new URI("https://a_b?c=1")

won't.

not sure why is that but my take-away is we should not make any assumptions on the implementation details of the Java URI class. If you have to use Java URI, it's probably better to fork the source code and make the changes you need.

Flyte answered 22/7, 2021 at 1:56 Comment(0)
B
0

I would say java.net.URL, which solved the problem, but it's been deprecated since java 20.

So I guess just use another language?

Bidden answered 26/7, 2024 at 13:24 Comment(4)
java.net.URL solves the problem stated in the question. android.net.Uri too but it's too platform specific. Please don't leave comments for your stats.Bidden
Have you actually tried android.net.Uri? I didn't. Why would you edit my answer to add this?Bidden
I just want to ask you, why do you feel entitled to make these comments and modify my answer when you obviously have no idea about the subject?Bidden
It seems that my attempt to help you improve your post did not convince you and that you for some reason are strictly against trying yourself. So I undid my edit and promise to not change it anymore. Good luck.Discotheque

© 2022 - 2025 — McMap. All rights reserved.