List of valid characters for the fragment identifier in an URL?
Asked Answered
D

3

96

I'm using the fragment identifier to create a permalink for AJAX events in my web app similar to this guy. Something like:

http://www.myapp.com/calendar#filter:year/2010/month/5

I've done quite a bit of searching but can't find a list of valid characters for the fragment idenitifer. The W3C spec doesn't offer anything.

Do I need to encode the characters the same as the URL in has in general?

There doesn't seem to be any good information on this anywhere.

Danielldaniella answered 17/5, 2010 at 14:16 Comment(0)
B
113

See the RFC 3986.

fragment    = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"    
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

So you can use !, $, &, ', (, ), *, +, ,, ;, =, something matching %[0-9a-fA-F]{2}, something matching [a-zA-Z0-9], -, ., _, ~, :, @, /, and ?

Belvabelvedere answered 17/5, 2010 at 14:22 Comment(7)
Perfect, I was looking for that in the RFC but couldn't seem to find the right clause. Thanks.Danielldaniella
@Artefacto, So does it mean that a "%" is not allowed everywhere, but only allowed when two valid characters follow it?Lemaceon
@Lemaceon yes, % is only allowed as an escape character. Use %25 to encode a single %.Supporter
The back / forward button doesn't work with fragment identifiers that have a colon in spite of the RFC stating that its a valid character.Jasminejason
Wow! Would be probably easier to tell what ascii characters cannot be used!Upstage
In case anyone wants a quick and dirty sanitizer like I did: myFragment.replace(/(?=((?:[\!\$&'\(\)\*\+,;=a-zA-Z0-9\-._~:@\/?]|%[0-9a-fA-F]{2})*))\1./g, "$1-"); Replace the - in the "$1-" with the desired placeholder charShaitan
So... basically base 81. Not a clean way to use that...Pahl
L
32

https://www.rfc-editor.org/rfc/rfc3986#section-3.5:

fragment    = *( pchar / "/" / "?" )

and

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="
pct-encoded   = "%" HEXDIG HEXDIG

So, combined, the fragment cannot contain #, a raw %, ^, [, ], {, }, \, ", < and > according to the RFC.

Lys answered 17/5, 2010 at 14:26 Comment(3)
Thanks. Gave the answer to Artefacto since he was a hair faster but gave you +1 for the response.Danielldaniella
I suppose you're missing non-printable ASCII characters and non-ascii characters.Belvabelvedere
Seems that you forgot VERTICAL BAR (|) and GRAVE ACCENT (`) and SPACE ( ) in the not-list. So the full list of printable (7-bit) US-ASCII characters in the not-list is: "#%< >[\]^`{|}Locoism
E
2

One other RFC speak of that: RFC-1738

URL schemeparts for ip based protocols:
HTTP

httpurl        = "http://" hostport [ "/" hpath [ "?" search ]]
hpath          = hsegment *[ "/" hsegment ]
hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search         = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
Emanation answered 2/4, 2014 at 12:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.