Multiple hash signs in URL
Asked Answered
B

3

25

Or maybe you call it "sharp" - the # symbol.

I've came across one instance, where #! and # used simultaneously in a single URL. From reading other articles, including RFC, I can't understand whether that is a legal combination or not. When encountering such page Mozilla browser (Iceweasel in this case) displays the URL as having 2 #'s, while Chrome displays only one, but dies shortly afterwards (the tab containing the page becomes unresponsive and crashes - but it may not be connected).

Now, my question is, is it legal to have both in one URL, is it maybe legal and redundant (should be normalized), or is it just a bug in Mozilla browser? So, suppose I'm making an AJAX request, or trying to navigate the browser history - what should I do, if I encounter this situation?

double hash in url

RFC-3986: https://www.rfc-editor.org/rfc/rfc3986#section-3.4 , which should be clarifying it... just in case.

Also: https://developers.google.com/webmasters/ajax-crawling/docs/specification how Google crawlers see things.

Bourgeois answered 1/6, 2012 at 12:59 Comment(2)
Possible duplicate of URL fragment (#) allowed charactersPoetry
^-- marked as duplicate in preference of directing people to the other question, because answer here does not give concrete list of allowed characters and rationale, rather sends one on a hunt for what a pchar is.Poetry
C
22

The format for a fragment only allows slashes, question marks, and pchars. If you look up the RFC, you'll see that the hash mark is not a valid pchar.

However, browsers will try their best to read non-valid URLs by treating repeat hashes as though they are escaped, as you can see by checking the value of window.location.hash (in IE, Firefox, and Chrome) for

http://www.example.com/hey#foo#bar

which is the same window.location.hash for

http://www.example.com/hey#foo%23bar
Castle answered 1/6, 2012 at 13:15 Comment(3)
No, the ABNF makes it pretty clear that you can't have an unescaped "#" in the fragment.Weatherford
Ah, so then this is just a case of the browser forcing validity by treating multiple hashes as though they were escaped.Castle
Just came across this, it might be a good complement to your answer : lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2008Nov/…Inebriant
E
10

My answer is a clear no, at least when referring to RFC 3986. But you have to look at more than just 3.4

Section 3 defines the structure of an URI as follows:

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

(I just took the upper part, relevant for URLs)

So, to answer your question, you have to look at all the parts:

  • The scheme may not contain a hash sign (only ALPHA *( ALPHA / DIGIT / "+" / "-" / ".")
  • The autority may not contain a hash (I don't go into detail here) and is even 'terminated by the next slash ("/"), question mark ("?"), or number sign ("#")'.
  • The path 'consists of a sequence of path segments separated by a slash ("/") character.' The path segments in turn can only consist of pchars, see e.g. this answer. So no hashes here! It will also be terminated 'by the first question mark ("?") or number sign ("#"), or by the end of the URI'.
  • The query part (indicated by the first "?") may only consist of pchar, "/" or "?" and will be 'terminated by a number sign ("#") character or by the end of the URI.'

So, no hashes allowed so far except for terminating the URI, which is not what we want, if would like to use at least one hash ;-)

Finally:

  • The fragment is 'indicated by the presence of a number sign ("#")' and also consists only of pchar, "/" or "?". It is 'terminated by the end of the URI'.

To sum up: Only one "#" is allowed in a compliant URL (or URI) as the marker for the URL-fragment. Especially hash signes that are supposed to be in the path (at least from the looks, as there are slashes afterwards) are problematic as they officially terminate the path part.

This can cause problems e.g. in single page applications where this is used because the navigation after the hash is done on client side not on the server. In this case, the SPA should make sure, it correctly handles the rest of the URL on reception which can include the possibly (browser specific) URL-encoded query and fragment .

Eckel answered 16/7, 2020 at 16:39 Comment(0)
A
2

It may be legal as @apsillers mentioned. But I would avoid it unless necessary as it can cause a certain confusion concerning the url.

That kind of url:

http://www.example.com/hey#foo#bar

Seems really confusing to me and will be even more confusing to regular users and maybe search engines.

Algorithm answered 1/6, 2012 at 13:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.