Best way to handle security and avoid XSS with user entered URLs

Asked 15/10, 2008 at 18:46 Answered 8/10, 2021 at 7:26

We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?

Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:

<a href="http://stackoverflow.com">stackoverflow.com</a>

What I really worry about is them using this in a XSS hack. I.e. they input:

alert('hacked!');

So other users get this link:

<a href="javascript:alert('hacked!');">stackoverflow.com</a>

My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

Morbidezza answered 15/10, 2008 at 18:46 Comment(23)

I do need to second @Nick's comment - Javascript is not synonymous with a URL. Are you sure this isn't a question about sanitizing user input, and preventing entered data from being executed if it's actually code? – Lascivious 15/10, 2008 at 18:52

I do actually know that javascript!=url. But most places you can get a url into you can cram inline javascript to. – Morbidezza 15/10, 2008 at 18:56

You can second it by upmodding it. My answer is very relevant. – Grieg 16/10, 2008 at 0:22

The example is misleading and the sentence "If you think URLs can't contain code, think again!" in the accepted answer makes it worst. What these suggest is that a valid URL in a anchor tag <a href=URL ... > can be a security issue, but it's not. The issue is that the input is not necessarily a valid URL path such as alert('hacked!');. A bad "URL path" would be this: stackoverflow.com">stackoverflow.com</a><script> bad stuff</script><a href=". The result after insertion is <a href="stackoverflow.com">stackoverflow.com</…> bad stuff</script><a href="">stackoverflow.com</a> – Surplus 5/4, 2020 at 0:7

@Surplus this is a 12 year old question. The answer to this now is strong CSP headers supported by most browsers. I'm not sure the nuance of whether it is the URL that is bad or the scripting content you put in an href attribute unescaped that is actually bad in its place even matters. – Morbidezza 6/4, 2020 at 20:50

@Morbidezza The way I understand it, strong CSP headers supported by browsers is not what this question is about. It's even independent of whether the input string (the URL) comes from a browser or a database, etc. It is all about proper escaping of that input before we insert in an html page. If the output html is hacked, no policy in the browser will help you. The emphasis on the URL being valid or not is confusing. The key point is that the apparent URL in the input string might not be the URL in the created HTML page. – Surplus 7/4, 2020 at 11:13

But I read my original comment again and I see a point of possible confusion. The example that I gave is only "invalid" if it is not escaped. My point was that, in contrast, "alert('hacked');" does not even need escaping. In that way, it was valid. – Surplus 7/4, 2020 at 11:34

@Surplus er alert('hacked') is just a very basic XSS example - if I can get that to run I can get other (more malicious) code to run. CSP stops it because it can block all inline execution (that's how a policy in the browser will help you). It's not that the HTML coming from the server is hacked, it's that I wanted users to be able to share links, but not share a script that executes as the other user. I don't think the comments here are the best place for you to learn about how XSS attacks are executed. – Morbidezza 17/4, 2020 at 15:7

I know about XSS and related security policies and did not make the comment to learn about that. Irrespective of the title, what is described is most naturally seen as if the exchange is done through new html coming from a server that processes (possibly unsafely) the input. – Surplus 17/4, 2020 at 16:52

@Surplus yes, that was in the original question. What's your point? – Morbidezza 22/4, 2020 at 22:26

Again, my point is that the question was not about CSP [to block inline scripts.] If it was, then all the answers erred. Besides, what's wrong in having a server that processes safely the input? – Surplus 23/4, 2020 at 1:30

And thinking more about it, even if it was about CSP, it remains that the example provided is not a good example of an attack. It only results in a weird url, which was my original point. Why are we having this discussion? – Surplus 23/4, 2020 at 1:43

@Surplus That's the question - can a server reliably filter every possible evasion, the list of known evasions is huge? The answer is mostly yes but if you're in a high risk environment you can't guarantee it. In my original question I was only trying to show the vector, not give good examples of attacks. CSP wasn't viable when I asked this, but it is now, use CSP, sanitise too, but use CSP. I don't know why you keep commenting, I'm not sure what your point is? – Morbidezza 23/4, 2020 at 7:11

You agree that it was not a good example of attack. It can be confusing for some. That's all what my little comment said. Why could we not leave it at that? – Surplus 23/4, 2020 at 10:35

@Surplus it's not intended an example of an attack. It's a proof that the XSS vector is present. Generally when testing XSS we're not trying to build a specific attack, we're just trying to get any JS to execute. alert is very commonly used because you get a pop-up when you have an XSS vector, but generally if I can get alert through your server-side sanitisation script then I can execute any script I want. In any case, thanks for clarifying that my XSS feasibility example is not a actual exploit. – Morbidezza 23/4, 2020 at 11:5

There will be no alert, because there would be no javascript, only a weird URL that looks like javascript. Ok, that's the end. – Surplus 23/4, 2020 at 14:55

@Surplus yes, if the sanitisation script successfully clears it or CSP blocks it. If not and the alert pops up then you have an XSS vector. The question is there a sanitisation script that guarantees nothing will get through, the selected answer is basically "no but here's what we do". – Morbidezza 7/5, 2020 at 14:58

Create an html file that contains the link <a href="alert('hacked!');" > alert('hacked!');</a>. This is the non sanitized output. Then, if you are on linux, in the same folder, create a file named alert('hacked!'); . Visit the html file and click the link. The other file will be downloaded, just as expected. There will be no alert. There is no attack at all. – Surplus 7/5, 2020 at 18:4

@Surplus ah, I get why you're confused now. That behaviour is (relatively) new and still not in all browsers (IE11 intranet mode for instance). Back in 2008 (when this was asked) href="script" would have executed the script by default. This is a 12 year old question with a 12 year old answer, and while XSS is still a risk the basic vector described here is blocked by the vast majority of browsers today. I'd still sanitise it though. – Morbidezza 11/5, 2020 at 9:28

I remember when the change occurred. Some inline scripts that used to work didn't anymore. I should have mentioned it. Still, the basic point of my little comment remains valid: the example attack can be confusing for some, because nowadays it's just a weird URL. – Surplus 11/5, 2020 at 17:36

Also, be careful of your tone. It's a bit condescending. I just made a small comment. I should not be told I am confused, reminded the obvious about security, to go elsewhere to learn about it, etc. – Surplus 12/5, 2020 at 2:38

@Surplus you told me that "example is misleading...accepted answer makes it worst[sic]", I said it's 12yrs old and out of date you came back with "...is not what this question is about" when I asked the question. I know what it is about. It's taken me 10 comments to get to the bottom of what it was that you actually wanted and I've updated the question, but this was never about alert specifically as the XSS vector, and the fact it is treated as a resource path now doesn't make all XSS attacks impossible. You should still sanitise URLs, and add CSP too. – Morbidezza 12/5, 2020 at 11:9

I understand that you took my small comment as an attack, but I just wanted to point out that the example is confusing and it is indeed confusing in the current context of standard modern browsers. I did not consider the feeling of the person who wrote the question 12 years ago. Sorry about that. – Surplus 12/5, 2020 at 11:44

If you think URLs can't contain code, think again!

https://owasp.org/www-community/xss-filter-evasion-cheatsheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}

Photima answered 15/10, 2008 at 18:56 Comment(12)

I've seen that link before - it's part of what I worry about with this. We have to be very careful as a single XSS hack could cost us a great deal. Your Regex based solution seems to have been working well on SO, certainly. Would you consider it safe for, say, banking applications? – Morbidezza 16/10, 2008 at 9:51

not so well I might say Keith, <a href="#209827 does not accept special chars in the URL</a>, that with URL Rewriting are safe to pass like: <pre>gynækologen.dk/Undersøgelser_og_behandlinger.aspx</…> – Wally 16/10, 2008 at 20:40

This is not enough. Unless I'm missing something, this string would pass through the filter: javascript:alert('hacked') – Pentad 16/10, 2008 at 20:44

Even this would get through: javascript:while(true)alert('Hacked!'); I've tested a couple places here on SO and it looks like SanatizeUrl is only part of the solution. – Pentad 16/10, 2008 at 20:50

This set of characters still allows a lot of code. Lack of '"' can be worked around with /xxx/.source. – Volsung 19/10, 2008 at 0:55

How does this fit with what you posted here: codinghorror.com/blog/archives/001181.html – Shelburne 13/11, 2008 at 15:8

Doesn't this prevent users providing the (customary) http:// prefix to all web addresses? – Regardful 19/3, 2011 at 23:28

Five years later, I see no responses to the comments that give examples of how this answer is insecure. Yet it is the highest-voted answer on the highest-voted question (that I could find) on this topic! Given how awesome stackoverflow usually is, I'm surprised that I'm still not sure how to securely implement this relatively common scenario. – Extraterritoriality 6/11, 2013 at 15:50

Those examples aren't URL. An URL has a protocol and a ressource name. – Actinometer 10/3, 2015 at 15:39

Also, link seems to be no longer live. Mirrors (www).jb51.net/tools/xss.htm (beware of that link that might have some weird JS in it) and wayback machine web.archive.org/web/20110216074326/http://ha.ckers.org/xss.html – Widdershins 13/1, 2016 at 0:3

how to do that in js – Grevera 7/8, 2019 at 6:1

In validation, only check for what you need (it'll will be small & simple). Do not check for what you not need (it'll will be long code & hard to get all possible outcomes). – Lyonnesse 19/2, 2021 at 23:47

The process of rendering a link "safe" should go through three or four steps:

Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Jaala answered 16/10, 2008 at 20:8 Comment(1)

Link appears to be dead, and at one time it seems that it redirected to owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet – Tradescantia 15/2, 2019 at 20:22

Use a library, such as OWASP-ESAPI API:

PHP - http://code.google.com/p/owasp-esapi-php/
Java - http://code.google.com/p/owasp-esapi-java/
.NET - http://code.google.com/p/owasp-esapi-dotnet/
Python - http://code.google.com/p/owasp-esapi-python/

Read the following:

For example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );

Another example is to use a built-in function. PHP's filter_var function is an example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);

Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.

Still another example is the code from WordPress:

http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561

Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:

https://developers.google.com/safe-browsing/lookup_guide

Rolling your own regex for sanitation is problematic for several reasons:

Unless you are Jon Skeet, the code will have errors.
Existing APIs have many hours of review and testing behind them.
Existing URL-validation APIs consider internationalization.
Existing APIs will be kept up-to-date with emerging standards.

Other issues to consider:

What schemes do you permit (are file:/// and telnet:// acceptable)?
What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?

Jannjanna answered 5/4, 2013 at 4:15 Comment(6)

Cheers, but the problem here is that OWASP isn't Jon Skeet either. I don't want to roll my own, my real question is about the extent to which any of these can be relied on. I'll check out the OWASP one, but definitely don't trust any security built in to PHP! – Morbidezza 5/4, 2013 at 7:53

If you can, try the Google Safe Browsing API. It might not be appropriate for your situation, but if the source code is available it could serve as an excellent starting point. – Jannjanna 5/4, 2013 at 19:43

This is the only answer with actual code that hasn't been pointed out to be insecure. IMHO, the best answer. – Extraterritoriality 6/11, 2013 at 16:1

"This is the only answer with actual code that hasn't been pointed out to be insecure. IMHO, the best answer." Nope, it is not. filter_var($url, FILTER_SANITIZE_URL); allows e.g. javascript:alert(); – Ridicule 28/2, 2015 at 17:44

I looked into using this library to sanitize urls but couldn't find any action to do that for .net. This is where I took the code and documentation from code.google.com/archive/p/owasp-esapi-dotnet/downloads, the project itself looks stale – Tingley 13/3, 2019 at 15:43

@DaveJarvis I think the link you mentioned is not related to this question, maybe you meant this learn.microsoft.com/en-us/aspnet/core/security/… . The link related to XSS covers good coding practices which will avoid avoid using nontrusted urls. Unfortunately for older big projects this is more expensive to verify and change. – Tingley 13/3, 2019 at 18:53

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

Pentad answered 15/10, 2008 at 18:57 Comment(1)

Whitelist is neccessary, because IE allows tab characters in protocol, i.e. java&x09script: works in IE and bypasses blacklists. – Volsung 19/10, 2008 at 0:56

You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library

It is very easy to use, all you need is an include and that is it :)

While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications

If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)

Wally answered 15/10, 2008 at 18:51 Comment(1)

We're specifically on C# 3.5 and ASP.Net - I'll check that library out. – Morbidezza 15/10, 2008 at 18:59

For Pythonistas, try Scrapy's w3lib.

OWASP ESAPI pre-dates Python 2.7 and is archived on the now-defunct Google Code.

Olenolin answered 17/3, 2019 at 10:36 Comment(2)

which functions in w3lib? maybe safe_url_string? w3lib.readthedocs.io/en/latest/… – Jurisdiction 18/3, 2021 at 22:53

I tried to use safe_url_string on a malicous URL for XXE attack and it didn't sanitize – Grimbald 11/11, 2021 at 6:36

How about not displaying them as a link? Just use the text.

Combined with a warning to proceed at your own risk may be enough.

addition - see also Should I sanitize HTML markup for a hosted CMS? for a discussion on sanitizing user input

Lascivious answered 15/10, 2008 at 18:49 Comment(4)

That's an idea we thought of, definitely secure, but our users are relatively low-tech. They would really like links that they can click. – Morbidezza 15/10, 2008 at 19:6

understandable, I prefer them generally, but copy/paste does make me take a couple seconds to decide if I REALLY want to do it – Lascivious 15/10, 2008 at 19:47

That's not secure either. They could still find a way to embed a script tag. – Formaldehyde 15/10, 2008 at 23:40

Why are we allowing tags? I assume he was referring to turning any instance of: - somesite.com - somesite.com In to <a href="somesite.com">http://somesite.com</a> – Grieg 16/10, 2008 at 0:25

There is a library for javascript that solves this problem https://github.com/braintree/sanitize-url Try it =)

Touter answered 8/10, 2021 at 7:26 Comment(0)

In my project written in JavaScript I use this regex as white list:

 url.match(/^((https?|ftp):\/\/|\.{0,2}\/)/)

the only limitation is that you need to put ./ in front for files in same directory but I think I can live with that.

Foray answered 14/10, 2018 at 15:14 Comment(0)

Using Regular Expression to prevent XSS vulnerability is becoming complicated thus hard to maintain over time while it could leave some vulnerabilities behind. Having URL validation using regular expression is helpful in some scenarios but better not be mixed with vulnerability checks.

Solution probably is to use combination of an encoder like AntiXssEncoder.UrlEncode for encoding Query portion of the URL and QueryBuilder for the rest:

    public sealed class AntiXssUrlEncoder
    {
        public string EncodeUri(Uri uri, bool isEncoded = false)
        {
            // Encode the Query portion of URL to prevent XSS attack if is not already encoded. Otherwise let UriBuilder take care code it.
            var encodedQuery = isEncoded ? uri.Query.TrimStart('?') : AntiXssEncoder.UrlEncode(uri.Query.TrimStart('?'));
            var encodedUri = new UriBuilder
            {
                Scheme = uri.Scheme,
                Host = uri.Host,
                Path = uri.AbsolutePath,
                Query = encodedQuery.Trim(),
                Fragment = uri.Fragment
            };
            if (uri.Port != 80 && uri.Port != 443)
            {
                encodedUri.Port = uri.Port;
            }

            return encodedUri.ToString();
        }

        public static string Encode(string uri)
        {
            var baseUri = new Uri(uri);
            var antiXssUrlEncoder = new AntiXssUrlEncoder();
            return antiXssUrlEncoder.EncodeUri(baseUri);
        }
    }

You may need to include white listing to exclude some characters from encoding. That could become helpful for particular sites. HTML Encoding the page that render the URL is another thing you may need to consider too.

BTW. Please note that encoding URL may break Web Parameter Tampering so the encoded link may appear not working as expected. Also, you need to be careful about double encoding

P.S. AntiXssEncoder.UrlEncode was better be named AntiXssEncoder.EncodeForUrl to be more descriptive. Basically, It encodes a string for URL not encode a given URL and return usable URL.

Sisile answered 23/9, 2021 at 21:47 Comment(2)

What language is this for? – Leyla 23/7, 2022 at 11:42

@Leyla - It is written in C# using dotnet library. – Sisile 25/7, 2022 at 12:43

-3

You could use a hex code to convert the entire URL and send it to your server. That way the client would not understand the content in the first glance. After reading the content, you could decode the content URL = ? and send it to the browser.

Narthex answered 26/7, 2013 at 8:14 Comment(3)

I don't see why that would help, there isn't a problem with code executing on the server. The problem is that the code looks like a link to the server, but executes malicious XSS when the user clicks on it. My question is whether (given the huge variety of possible attack permutations) there can ever be a check strict enough to be certain that XSS content cannot get through. – Morbidezza 26/7, 2013 at 8:21

Whatever I have gathered from my understanding is that, there is always a way to overcome the XSS filtering. – Narthex 6/8, 2013 at 12:43

Nothing is 100% safe, but our customers want high security and user entered links and I want to know the best way to do that. – Morbidezza 6/8, 2013 at 17:35

-9

Allowing a URL and allowing JavaScript are 2 different things.

Grieg answered 15/10, 2008 at 18:48 Comment(5)

No, they're not, if the URL is displayed back on the page. – Formaldehyde 15/10, 2008 at 18:49

?? a Uniform Resource Locator is not Javascript, displaying the URL back on the page has nothing to do with Javascript – Lascivious 15/10, 2008 at 18:50

That's what I used to think, too. Trust me on this: you are wrong. And if you think you're right, you are in big trouble. – Photima 15/10, 2008 at 18:53

Maybe I didn't explain it well enough: User enters "stackoverflow.com" and if we turn that into "<a href="stackoverflow.com">stackoverflow.com</a>" there's the risk introduced. If you just let anything through they can do: "<a href="alert('hacked!');">stackoverflow.com</a>" – Morbidezza 15/10, 2008 at 18:53

ok - that I can see being a risk, and in that case, the javascript could be viewed as a url; but, strictly speaking, that's still not a real url (google.com/…) – Lascivious 15/10, 2008 at 18:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags