In case you cannot add a dependency to your project, or you simply don't want to, here is a relatively simple implementation using a regular expression.
import java.util.regex.Pattern;
public final class UnicodeUnescape {
private static final Pattern UNICODE_ESCAPE_PATTERN =
Pattern.compile("(?<!\\\\)\\\\u(\\p{XDigit}{4})");
public static String unescape(String input) {
return UNICODE_ESCAPE_PATTERN.matcher(input).replaceAll(match -> {
char c = (char) Integer.parseInt(match.group(1), 16);
return Character.toString(c);
});
}
private UnicodeUnescape() {}
}
Though this is obviously not the most efficient implementation. Also, this will only handle Unicode escape sequences, unlike StringEscapeUtils#escapeJava(String)
from Apache Commons Text.
Note that Matcher#replaceAll(Function)
was added in Java 9.
Remove (?<!\\\\)
from the regex if literals like \\u2013
should still be "unescaped" into \–
.
Here's some very basic, non-exhaustive unit tests:
import static org.junit.jupiter.api.Assertions.assertEquals;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
class UnicodeUnescapeTests {
@Test
@DisplayName("Unicode sequence is unescaped")
void testUnescape() {
var unescaped = UnicodeUnescape.unescape("Dodd\\u2013Frank");
assertEquals("Dodd–Frank", unescaped);
}
@Test
@DisplayName("surrogate pair is unescaped")
void testUnescapeSurrogatePair() {
var unescaped = UnicodeUnescape.unescape("Dodd Frank \\uD83C\\uDF09");
assertEquals("Dodd Frank 🌉", unescaped);
}
@Test
@DisplayName("escaped Unicode sequence is unchanged")
void testEscapedUnicodeSequence() {
var unescaped = UnicodeUnescape.unescape("Dodd\\\\u2013Frank");
assertEquals("Dodd\\\\u2013Frank", unescaped);
}
}
Output (from Gradle):
UnicodeUnescapeTests > escaped Unicode sequence is unchanged PASSED
UnicodeUnescapeTests > surrogate pair is unescaped PASSED
UnicodeUnescapeTests > Unicode sequence is unescaped PASSED
System.out.println(yourString);
do you see (1)Dodd\u2013Frank
or (2)Dodd–Frank
? – Churchgoer"Dodd\u2013Frank".chars().forEach(a -> System.out.print((char) a));
? – Masconorg.apache.commons.lang3.StringEscapeUtils
is deprecated, but moved tocommons-text
asimport org.apache.commons.text.StringEscapeUtils
which is not deprecated. – Sprouse