Java escape HTML
Asked Answered
G

6

38

currently I use org.apache.commons.lang.StringEscapeUtils escapeHtml() to escape unwanted HTML tags in my Strings but then I realized it escapes characters with accents to &something;, too, which I don't want.

Do you know any solution for escaping HTML tags but leave my special (well, for some people, they are normal here ;]) letters as they are?

Thanks in advance!

balázs

Glycoside answered 2/2, 2011 at 12:45 Comment(5)
&something; will be converted to &something; -- do you want character '&' not to be escaped? Most usual cases a user enters the symbol that &something; stands for, in UI. and escapeHTML just converts that special character to equivalent HTML entity.Elkin
I mean á gets converted to á which I don't want. I don't want letters to be escaped at all...everything else, yes.Comp
What do you need to escape HTML for? For JSP?Prosy
Almost, JSF. Do you have any other idea how to prevent users using tags in comments? I have to enable <br/> though, that's why I have to use escape false in the output tags.Comp
+50 bounty: Please try to give an answer closer to the original question, an escaping function wich will not hurt UTF-8 characters.Flautist
H
32
StringUtils.replaceEach(str, new String[]{"&", "\"", "<", ">"}, new String[]{"&amp;", "&quot;", "&lt;", "&gt;"})
Hinshaw answered 2/2, 2011 at 13:2 Comment(9)
OWASP also recommends ' and /.Homage
Which version of StringUtils is that? I have one in commons-lang-2.2 but no replaceEach method. Not critical though, that's actually easy to implement what you recommened. I would have like an out-of-box solution though :-/Comp
what about ® ¶ © ½ æ ÷ § and the rest of the shebang found at arnspublishing.com/QuickRef/ISO8859.html ?? =) That replace each is a disater waiting to happen!Levee
yeah but that's exactly what I did NOT want :) correct me if I'm wrong but I don't know any HTML tags like <§> :PComp
@ppumkin, please explain further.Gaelic
@pingw33n, I have tried importing org.springframework.util.StringUtils, org.apache.soap.util.StringUtils, org.apache.axis.utils.StringUtils, and com.ibm.wsdl.util.StringUtils, and none of them have StringUtils.replaceEach(). What are you importing to have access to this method? They seem to have a .replace() however.Gaelic
@MatthewDoucette it's org.apache.commons.lang.StringUtils: commons.apache.org/lang/api-2.5/org/apache/commons/lang/…Hinshaw
what if clients say that he wants &lt; as &lt; only?Wohlert
As @EtienneNeveu mentionned you MUST read wonko.com/post/html-escaping, it all depends on the contextInfective
Q
21

If it's for Android, use TextUtils.htmlEncode(String) instead.

Quigley answered 29/11, 2011 at 2:46 Comment(0)
J
9

This looks very good to me:

org/apache/commons/lang3/StringEscapeUtils.html#escapeXml(java.lang.String)

By asking XML, you will get XHTML, which is good HTML.

Joggle answered 22/11, 2012 at 17:21 Comment(0)
S
6

Here's a version that replaces the six significant characters as recommended by OWASP. This is suitable for HTML content elements like <textarea>...</textarea>, but not HTML attributes like <input value="..."> because the latter are often left unquoted.

StringUtils.replaceEach(text,
        new String[]{"&", "<", ">", "\"", "'", "/"},
        new String[]{"&amp;", "&lt;", "&gt;", "&quot;", "&#x27;", "&#x2F;"});
Suazo answered 5/6, 2013 at 18:54 Comment(1)
Thanks! Adapted for another solution here: https://mcmap.net/q/410758/-replace-characters-with-html-entities-in-java-duplicate. I chose to declare the characters as static final for performance. I also replaced the hex markup with human-readable replacements.Concerned
L
6

I know is too late to adding my comment, but perhaps the following code will be helpful:

public static String escapeHtml(String string) {
    StringBuilder escapedTxt = new StringBuilder();
    for (int i = 0; i < string.length(); i++) {
        char tmp = string.charAt(i);
        switch (tmp) {
        case '<':
            escapedTxt.append("&lt;");
            break;
        case '>':
            escapedTxt.append("&gt;");
            break;
        case '&':
            escapedTxt.append("&amp;");
            break;
        case '"':
            escapedTxt.append("&quot;");
            break;
        case '\'':
            escapedTxt.append("&#x27;");
            break;
        case '/':
            escapedTxt.append("&#x2F;");
            break;
        default:
            escapedTxt.append(tmp);
        }
    }
    return escapedTxt.toString();
}

enjoy!

Lollard answered 22/8, 2015 at 10:50 Comment(1)
You should use StringBuilder.Corpora
M
0

If you're using Wicket, use:

import org.apache.wicket.util.string.Strings;
...
CharSequence cs = Strings.escapeMarkup(src);
String str =      Strings.escapeMarkup(src).toString();
Maunsell answered 27/5, 2015 at 13:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.