Human-readable URLs: preferably hierarchical too?
Asked Answered
B

1

6

In a now migrated question about human-readable URLs I allowed myself to elaborate a little hobby-horse of mine:

When I encounter URLs like http://www.example.com/product/123/subpage/456.html I always think that this is an attempt on creating meaningful hierarchical URLs which, however, is not entirely hierarchical. What I mean is, you should be able to slice off one level at a time. In the above, the URL has two violations on this principle:

  1. /product/123 is one piece of information represented as two levels. It would be more correctly represented as /product:123 (or whatever delimiter you like)
  2. /subpage is very likely not an entity in itself (i.e., you cannot go up one level from 456.html as http://www.example.com/product/123/subpage is "nothing").

Therefore, I find the following more correct:

http://www.example.com/product:123/456.html

Here, you can always navigate up one level at a time:

  • http://www.example.com/product:123/456.html — The subpage
  • http://www.example.com/product:123 — The product page
  • http://www.example.com/ — The root

Following the same philosophy, the following would make sense [and provide an additional link to the products listing]:

http://www.example.com/products/123/456.html

Where:

  • http://www.example.com/products/123/456.html — The subpage
  • http://www.example.com/products/123 — The product page
  • http://www.example.com/products — The list of products
  • http://www.example.com/ — The root

My primary motivation for this approach is that if every "path element" (delimited by /) is selfcontained1, you will always be able to navigate to the "parent" by simply removing the last element of the URL. This is what I (sometimes) do in my file explorer when I want to go to the parent directory. Following the same line of logic the user (or a search engine / crawler) can do the same. Pretty smart, I think.

On the other hand (and this is the important bit of the question): While I can never prevent that a user tries to access a URL he himself has amputated, am I wrongfully asserting (and honouring) that a search engine might do the same? I.e., is it reasonable to expect that no search engine (or really: Google) would try to access http://www.example.com/product/123/subpage (point 2, above)? (Or am I really only taking the human factor into account here?)

This is not a question about personal preference. It's techical question about what I can expect of an crawler / indexer and to what extend I should take non-human URL manipulation into account when designing URLs.

Also, the structural "depth" of http://www.example.com/product/123/subpage/456.html is 4, where http://www.example.com/products/123/456.html is only 3. Rumour has it that this depth influences search engine ranking. At least, so I was told. (It is now evident that SEO is not what I know most about.) Is this (still?) true: does the hierarchical depth (number of directories) influence search ranking?

So, is my "hunch" technically sound or should I spend my time on something else?


Example: Doing it (almost) right
Good ol' SO gets this almost right. Case in point: profiles, e.g., http://stackoverflow.com/users/52162:

  • http://stackoverflow.com/users/52162 — Single profile
  • http://stackoverflow.com/users — List of users
  • http://stackoverflow.com/ — Root

However, the canonical URL for the profile is actually http://stackoverflow.com/users/52162/jensgram which seems redundant (the same end-point represented on two hierarchical levels). Alternative: http://stackoverflow.com/users/52162-jensgram (or any other delimiter consistently used).


1) Carries a complete piece of information not dependent on "deeper" elements.

Boucicault answered 25/10, 2010 at 17:48 Comment(3)
And this follow-up to your migrated question is on-topic here because it contains the programming question...?Infect
@Pascal Cuoq No, not necessarily (and it wasn't my question, just my "answer"). Perhaps this too should be migrated to "Pro Webmasters" but I don't know if the question is best answered by "professional and enthusiast programmers" or "professional and enthusiast webmasters".Boucicault
I disagree that "product/123" is only one piece of information; the first level states "is of type product" and the second states "has id 123".Tree
B
4

Hierarchical urls of this kind "http://www.example.com/product:123/456.html" are as useless as "http://www.example.com/product/123/subpage", because when users see your urls, they don't care about identifiers from your database, they want meaningful paths. This is why StackOverflow puts question titles into urls: "https://mcmap.net/q/1794388/-human-readable-urls-preferably-hierarchical-too".

Google advices against practice of replacing usual queries like "http://www.example.com/?product=123&page=456", because when every site develops it's own scheme, crawler doesn't know what each part means, if it's important or not. Google has invented sophisticated mechanisms to find important arguments and ignore unimportant, which means you'll get more pages into index and there will be less duplicates. But these algorithms often fail when web developers invent their own scheme.

If you care about both users and crawlers you should use urls like this instead:

Also, search engines give higher rating to pages with keywords in the url.

Bergh answered 25/10, 2010 at 18:3 Comment(2)
Yes, I agree with you on the uselessness of IDs in URLs. But regarding the URL of this question (to follow your example): is http://stackoverflow.com/questions/4017365/human-readable-urls-preferably-hierarchical-too penalised compared to the (fictional) alternative http://stackoverflow.com/questions/4017365-human-readable-urls-preferably-hierarchical-too (one level higher)? (The second part of my question.)Boucicault
@Boucicault I seriously doubt search engines "penalise" any URLs for additional levels. Even if there's difference, you're unlikely to notice it. Adding a keyword into URL matters hundred times more than the number of slash characters around it. :) It's just a character. Now that "human-readable" URLs (with unreadable IDs) are becoming popular, this character can mean virtually anything.Bergh

© 2022 - 2024 — McMap. All rights reserved.