Should I store spaces in my URLs in the database? If so, how do I encode them when putting them into <a href="...">?
Asked Answered
A

4

6

In my blog, I store URIs on entities to allow them be customised (and friendly). Originally, they could contain spaces (eg. "/tags/ASP.NET MVC"), but the W3C validation says spaces are not valid.

The System.Uri class takes spaces, and seems to encode them as I want (eg. /tags/ASP.NET MVC becomes /tags/ASP.NET%20MVC), but I don't want to create a Uri just to throw it away, this feels dirty!

Note: None of Html.Encode, Html.AttributeEncode and Url.Encode will encode "/tags/ASP.NET MVC" to "/tags/ASP.NET%20MVC".


Edit: I edited the DataType part out of my question as it turns out DataType does not directly provide any validation, and there's no built-in URI validation. I found some extra validators at dataannotationsextensions.org but it only supports absolute URIs and it looks like spaces my be valid there too.

Artist answered 15/5, 2011 at 11:8 Comment(8)
Your regular expression doesn't know about URL encoding, so it accepts any characters (but not no characters) after /tags/. There's a difference between URL encoded URLs (which is what the browser understands, and is sent over the network, for example) and your own set of paths. In this case, I'd store some "internal path" like /tags/Tup Peny and make sure it's encoded for the context when emitting it (in your case; URL encoding it for URL use). Does that make sense? :)Burdine
It does, but if it's "valid" for me to store "/tags/Tup Peny", how do I encode it when I output it in an anchor's href attribute in a way that validates the W3C validator?Artist
What about UrlPathEncode: msdn.microsoft.com/en-us/library/…Cryptomeria
That won't work either :( check out the example in the docs, it'll encode the /Artist
Do you have to store /tags/ then? That part seems like it could be added at runtime anyway.Cryptomeria
Tags is just one example, other entities will have slashes (eg. "/2011/03/my blog post"), so I'd like a generic solution. I don't want to add "/tags/" in my views, so the idea is that all entities will have the uris (urls?) available as a property.Artist
@Danny Regarding your note in the question; "Html.Encode, Html.AttributeEncode and Url.Encode" are not for encoding the path part of an URL. That's why they don't do what you want. :)Burdine
I know, I tried them out of desperation, and included it here to try to avoid people posting answers saying to try them :DArtist
A
2

It seems that the only sensible thing to do is not allow spaces in URLs. Support for encoding them correctly seems flaky in .NET :(

I'm going to instead replace spaces with a dash when I auto-generate them, and validate they only contain certain characters (alphanumeric, dots, dashes, slashes).

I think the best way to use them would be to store %20 in the DB, as the space is "unsafe" and it seems non-trivial to then encode them in a way that will pass the W3C validator in .NET.

Artist answered 15/5, 2011 at 14:3 Comment(12)
No, that's not true. Support for encoding spaces has been around everywhere on the web for many years now. The trick is to encode them using the correct encoding at the right time (as with any encoding). I agree that your dash-instead-of-space solution is fine, but please don't think support for %20 is broken around the web. :)Burdine
Ok, maybe it's not that widespread, but .NET is a huge web platform and (seemingly) has no sensible way to fix "unsafe" characters in URLs :(Artist
@Danny Yes it does. Please review all the answers and their comments again. :)Burdine
I did. That method encodes slashes in the path and fails to encode spaces. In the example, "contoso.com/articles.aspx?title = ASP.NET Examples" becomes "http%3a%2f%2fwww.contoso.com%2farticles.aspx?title = ASP.NET Examples". This is the exact opposite of what I want ;)Artist
@Danny If you want http://www.contoso.com/etc to appear as the path part of an URL, then that's the way you have to encode it. I think you're confusing HTML, URIs, URLs, paths and queries here. I assume you're using the ASP.NET MVC 3 helpers, so you should use Url.Action anyway. But as long as you encode the path part (/tag/TUP PENY) using the path part encoder, and the query part (if any) using the query part encoder, and leave anything else (like the protocol, host name, port number etc) untouched, you will be fine. Encoding makes any character safe. That's the whole point. :)Burdine
@Danny Regarding the "encoding of forward slashes"; that's purely cosmetic. TUP%2fPENY and TUP/PENY in the path part of an URL is the same thing. But the reason the slash is encoded is that you're telling the encoder to encode it. Slashes in ASP.NET MVC are usually route part delimiters, not just some characters that should be run through a generic URL path part encoder.Burdine
@Burdine Using Url.Action doesn't help, as the full URL is stored on my entity. I'm simply trying to output a link to "/tags/ASP.NET MVC", and the value I have in the database is "/tags/ASP.NET MVC". Since it's invalid to write <a href="/tags/ASP.NET MVC"> I need to find a way to encode the URL to <a href="/tags/ASP.NET%20VC">. This is what the whole question was, but there doesn't seem to be a simple solution.Artist
@Burdine The encoding of slashes is not cosmetic. If I output <a href="%2ftags"> then the browser renders a link to "/%2ftags"!Artist
@Danny The solution is to treat URLs the way they should be treated in a web MVC framework: not as arbitrary strings "stored on entities", but actual RESTful routes to resources. That way, Url.Action can construct the URL for you. (The second simplest solution is of course UrlPathEncode + manually decoding the slashes.)Burdine
@Burdine That won't work. I'm storing blog posts and pages, and I deliberately want them to be able to have arbitrary URLs. Manually decoding slashes is a hack, which is why I said there's (seemingly) no nice way to do what I wanted.Artist
@Danny Actually, it's using arbitrary URLs for non-static resources in ASP.NET MVC that's a hack. :) The reason the space needs to be encoded in /tags/TUP PENY isn't that the space is "unsafe" per se, but that the space actually does need encoding to %20 for use in the path part of an URL. And when you tell the URL path encoder to encode the whole path it does exactly that, including the forward slashes. I'm not sure there's a "nice" way to accomplish what you want - surely there's an URL encoder somewhere in .NET which encodes spaces but not slashes, but using that would be a hack too.Burdine
@Burdine My project uses MVC routing in the intended way for lots of stuff, but posts/pages are required to be completely flexible. That's a requirement of my app (and for compatibility with the previous one). Using Uri.Action doesn't solve this problem anyway, so the point is somewhat moot. If any URL parser that encodes spaces and not slashes is a hack, then a) storing spaces is probably invalid and b) javascript is a hack.Artist
C
0

I haven't used it, but UrlPathEncode sounds like it may give what you want.

You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation.

EDIT: The javascript method encodeURI will use %20 instead of +. Add a reference to Microsoft.JScript and call GlobalObject.encodeURI. Tried the method here and you get the result you're looking for:

Cryptomeria answered 15/5, 2011 at 11:39 Comment(13)
It also encodes slashes, which will not convert "/tag/ASP.NET MVC" to "/tag/ASP.NET%20MVC" :(Artist
That seems to work, but doesn't it feel like a massive hack? Surely there's a much simpler way? What I'm trying to do doesn't feel unusual or uncommon :(Artist
In a way it does. But looking at others who have asked this question already, it looks like there's not a really nice solution to this... either the 2 step solution Programming Hero says or this one. #3376289Cryptomeria
Usually if something is difficult or hacky, you're doing it wrong. I'm wondering if spaces are just completely invalid and I should be storing "/tags/ASP.NET%20" instead :/Artist
Thats a possibility. I assume your input is really "ASP.NET MVC" and then you concat that with /tags/ before you put it in the database. In that case you could call UrlPathEncode on the input before combining the two.Cryptomeria
Note that it looks like stackoverflow uses - instead of space, so it probably wasn't a problem they even wanted to touch ;)Cryptomeria
on tags, it is. But on some cases I want to type custom URLs when creating an entity, eg. "/good stuff/my first article", so it falls down. I guess the real question is whether "/good stuff/my first article" is actually a valid URL at all.Artist
Yeah, I actually use dashes instead of spaces, but I wanted to understand the issue rather than just dodging it this time :D I'm getting close to thinking validation (and a "SafeUrl" method) to only allow alphanumerics and dashes might be easiest!Artist
its considered unsafe. You might consider replacing spaces with a dash or underscore - i Know i see that in a lot of blog formats. #498408Cryptomeria
@ avoiding dodging the issue - I understand. I guess not-dodging feels hacky though.... since everyone else dodges it ;)Cryptomeria
Yep, you're right. I'm going to just create some validation and specify characters that are allowed and use it for all entities. Shame that what sounds like a trivial problem is actually so complicated to solve :/Artist
My vote goes for dodging the issue. URLs with Hex-encoded values are pretty hard for humans to eyeball. I'd rather see /blog/my-first-post/ over /blog/my%20first%20post (blarg).Chiasma
%20 is for the path part, and + is for the query part. "Within the query string, the plus sign is reserved as shorthand notation for a space" according to the RFC. So /tags/Tup%20Penny?Tup+Peny is the right way to encode the raw path /tags/Tup Penny with the query parameter Tup Peny. This is why UrlEncode and UrlPathEncode differ. Nothing to do with ASP.NET MVC. :) And AFAIK, this was only ever a problem when people had malformed URLs in their HTML, eg. actual un-encoded spaces and whatnot. Right?Burdine
C
0

URI and URLs are two different things, URLs being a subset of URIs. As such, a URL has different restrictions to URIs.

To encode a path string to proper W3C URL encoding standards, use HttpUtility.UrlPathEncode(string). It'll add the encoded spaces you're after.

You should store your URLs in whatever form that is most useful for you to work with them. It can be useful to refer to them as URIs until the point at which you encode them into a URL-compliant format, but that's just semantics to help your design be a little clearer.

EDIT:

If you don't like the slashes being encoded, it's pretty simple to "decode" them by replacing the encoded %2f with the simpler /:

var path = "/tags/ASP.NET MVC";
var url = HttpUtility.UrlPathEncode(path).Replace("%2f", "/");
Chiasma answered 15/5, 2011 at 11:39 Comment(4)
UrlEncode will convert slashes, which messes up the links. The System.Uri class seems to correctly take a path and encode the spaces (without slashes), however EF seems to barf on a Uri without special handling :(Artist
I'm trying to read about URI vs URL to make sure I'm referring to things correctly, but the more I read, the more I'm confused! ;(Artist
Uniform Resource Identifier vs Uniform Resource Locator. Check out Wikipedia for some clear advice: en.wikipedia.org/wiki/Uniform_Resource_IdentifierChiasma
The lack of examples was what confused me, but now I've changed my properties to be called "Url", as it seems that's more appropriate.Artist
J
0

I asked this similar question a while ago. The short answer was to replace spaces with "-" and then back out again. This is the source I used:

private static string EncodeTitleInternal(string title)
{
        if (string.IsNullOrEmpty(title))
                return title;

        // Search engine friendly slug routine with help from http://www.intrepidstudios.com/blog/2009/2/10/function-to-generate-a-url-friendly-string.aspx

        // remove invalid characters
        title = Regex.Replace(title, @"[^\w\d\s-]", "");  // this is unicode safe, but may need to revert back to 'a-zA-Z0-9', need to check spec

        // convert multiple spaces/hyphens into one space       
        title = Regex.Replace(title, @"[\s-]+", " ").Trim(); 

        // If it's over 30 chars, take the first 30.
        title = title.Substring(0, title.Length <= 75 ? title.Length : 75).Trim(); 

        // hyphenate spaces
        title = Regex.Replace(title, @"\s", "-");

        return title;
}
Joost answered 16/5, 2011 at 8:18 Comment(2)
Though it doesn't really answer the question about using spaces/encoding, this is pretty much what I'm going to do now :-)Artist
@Danny my answer is definitely don't bother URL encoding but replace with dashes (and remove other bad characters), to make them SEO friendly just like Stackoverflow doesJoost

© 2022 - 2024 — McMap. All rights reserved.