In a now migrated question about human-readable URLs I allowed myself to elaborate a little hobby-horse of mine:
When I encounter URLs like
http://www.example.com/product/123/subpage/456.html
I always think that this is an attempt on creating meaningful hierarchical URLs which, however, is not entirely hierarchical. What I mean is, you should be able to slice off one level at a time. In the above, the URL has two violations on this principle:
/product/123
is one piece of information represented as two levels. It would be more correctly represented as/product:123
(or whatever delimiter you like)/subpage
is very likely not an entity in itself (i.e., you cannot go up one level from456.html
ashttp://www.example.com/product/123/subpage
is "nothing").Therefore, I find the following more correct:
http://www.example.com/product:123/456.html
Here, you can always navigate up one level at a time:
http://www.example.com/product:123/456.html
— The subpagehttp://www.example.com/product:123
— The product pagehttp://www.example.com/
— The rootFollowing the same philosophy, the following would make sense [and provide an additional link to the products listing]:
http://www.example.com/products/123/456.html
Where:
http://www.example.com/products/123/456.html
— The subpagehttp://www.example.com/products/123
— The product pagehttp://www.example.com/products
— The list of productshttp://www.example.com/
— The root
My primary motivation for this approach is that if every "path element" (delimited by /
) is selfcontained1, you will always be able to navigate to the "parent" by simply removing the last element of the URL. This is what I (sometimes) do in my file explorer when I want to go to the parent directory. Following the same line of logic the user (or a search engine / crawler) can do the same. Pretty smart, I think.
On the other hand (and this is the important bit of the question): While I can never prevent that a user tries to access a URL he himself has amputated, am I wrongfully asserting (and honouring) that a search engine might do the same? I.e., is it reasonable to expect that no search engine (or really: Google) would try to access http://www.example.com/product/123/subpage
(point 2, above)? (Or am I really only taking the human factor into account here?)
This is not a question about personal preference. It's techical question about what I can expect of an crawler / indexer and to what extend I should take non-human URL manipulation into account when designing URLs.
Also, the structural "depth" of http://www.example.com/product/123/subpage/456.html
is 4, where http://www.example.com/products/123/456.html
is only 3. Rumour has it that this depth influences search engine ranking. At least, so I was told. (It is now evident that SEO is not what I know most about.) Is this (still?) true: does the hierarchical depth (number of directories) influence search ranking?
So, is my "hunch" technically sound or should I spend my time on something else?
Example: Doing it (almost) right
Good ol' SO gets this almost right. Case in point: profiles, e.g., http://stackoverflow.com/users/52162
:
http://stackoverflow.com/users/52162
— Single profilehttp://stackoverflow.com/users
— List of usershttp://stackoverflow.com/
— Root
However, the canonical URL for the profile is actually http://stackoverflow.com/users/52162/jensgram
which seems redundant (the same end-point represented on two hierarchical levels). Alternative: http://stackoverflow.com/users/52162-jensgram
(or any other delimiter consistently used).
1) Carries a complete piece of information not dependent on "deeper" elements.