Should URL be case sensitive?
Asked Answered
H

18

357

I noticed that

HTTP://STACKOVERFLOW.COM/QUESTIONS/ASK

and

http://stackoverflow.com/questions/ask

both works fine - actually the previous one is converted to lowercase.

I think that this makes sense for the user.

If I look at Google then this URL works fine:

http://www.google.com/intl/en/about/corporate/index.html  

but this one with "ABOUT" is not working:

http://www.google.com/intl/en/ABOUT/corporate/index.html   

Should the URL be case sensitive?

Hashum answered 3/11, 2011 at 14:50 Comment(6)
IMHO, URL should never be case sensitive, that's just making life harder for people who will be using it.Marshmallow
The question "SHOULD urls be case-sensitive?" is a bad question because it invokes opinion. Rather, a better question would be, "WHY are (or WHY aren't) urls case-sensitive?", or "Why are some urls case-sensitive while others are not?"Waterman
But for one possible answer, check out WHATWG’s new URL Standard, which has been adopted by node.js.Waterman
in my opinion, no they shouldnt beUmbilicate
if the browser doesn't honor the case, ipfs address will be broken, but it's not brokenMalkin
I think we need to be a bit more careful about what we call "opinion" and whether that's bad. All answers on SO that have multiple different answers involve opinions; it's just a matter of what those opinions are based on. Hopefully we get well informed opinions based on expertise and experience. Funnily enough for this question "should" URLs be case sensitive is a good question because the word "should" is used in the RFCs, so the question can be answered directly from the RFCs - opinions based on following the RFCs should be agreeable to most people, but not all.Talcahuano
O
368

According to W3's "HTML and URLs" they should:

There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.

Option answered 3/11, 2011 at 14:54 Comment(8)
I guess "be liberal in what you accept and conservative in what you send" (IETF speak) would be my guideline.Option
W3 guideline is reasonable. It simply states that one shouldn't make an assumption on how the server handles the URL you are submitting. It is up to the server how to handle the request URL. Most of web servers are unix/linux and that means most of web servers are case sensitive.Endocrinology
W3 says USERS should assume that servers are case-sensitive, but does not give a recommendation for SERVERS.Immingle
For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http"). SourceTref
@PK_ Note that this only holds for the scheme portion of the URL. RFC1738 does not discuss whether other parts of the URL should be interpreted as case sensitive or not.Strict
@PK_J This part is relevant only for the scheme part of the url (HTTP->http, FTP->ftp)Fourdrinier
I think this and many of the answers around what the spec does or does not say is missing the point of the question.**Should** they be case sensitive? That's a loaded question really. From a user's point of view, case sensitivity is a pain point, not all know makes a difference. The question of whether URIs should or shouldn't be, depends on the context of the question. For technical flexibility, yes, they should be. For usability, no, they should not be.Citarella
It may be worth noting that schemes such as https, ftp, irc and mailto all contain DNS names, which we know are case-insensitiveManhood
D
158

All “insensitive”s are boldened for readability.

Domain names are case insensitive according to RFC 4343. The rest of URL is sent to the server via the GET method. This may be case sensitive or not.

Take this page for example, stackoverflow.com receives GET string /questions/7996919/should-url-be-case-sensitive, sending a HTML document to your browser. Stackoverflow.com is case insensitive because it produces the same result for /QUEStions/7996919/Should-url-be-case-sensitive.

On the other hand, Wikipedia is case sensitive except the first character of the title. The URLs https://en.wikipedia.org/wiki/Case_sensitivity and https://en.wikipedia.org/wiki/case_sensitivity leads to the same article, but https://en.wikipedia.org/wiki/CASE_SENSITIVITY returns 404.

Dijon answered 14/6, 2013 at 16:35 Comment(4)
Wikipedia is actually very forgiving for case-sensitivity in cases where users may think a word should be one case or another, but this is more because of the OCD... sorry, considerate nature of its editors. Its URL's are technically case-sensitive, though.Immingle
That's because the semantic, readable part of a question's URL in stackoverflow does not identify it, it's identified by 7996919. The semantic part of the URL is just there for SEO purposes.Solatium
Actually also https://mcmap.net/q/92708/-should-url-be-case-sensitive works. This is because stackoverflow.com's server only uses the question's ID to identify it and return the correct URL and HTML page.Houdini
The full resource identifier of SO question is not 7996919, but /questions/7996919[/*[/*]]. Try to access /foo/7996919 or /questions/7996919/foo/bar/baz and you will receive 404. But try to access /QUESTIONS/7996919/foo/bar and it will work fine. This means that while the last part of the URL (/should-url-be-case-sensitive) is really just for SEO purposes, the path of this specific resource is case insensitive anyway. That just proves the point: only scheme and host are case-insensitive by rfc3986; case sensitivity of all other parts depends on the server logic.Philology
E
83

Depends on the hosting os. Sites that are hosted on Windows tend to be case insensitive as the underlying file system is case insensitive. Sites hosted on Unix type systems tend to be case sensitive as their underlying file systems are typically case sensitive. The host name part of the URL is always case insensitive, it's the rest of the path that varies.

Exhibit answered 3/11, 2011 at 14:53 Comment(2)
Yes, as this one painfully found out on http requests to files on a Unix ftp server.Gossipy
It would be more accurate to say 'depends on the server' in the general sense - because serving files is not the only way to answer HTTP requests.Bronchopneumonia
A
48

The domain name portion of a URL is not case sensitive since DNS ignores case: http://en.example.org/ and HTTP://EN.EXAMPLE.ORG/ both open the same page.

The path is used to specify and perhaps find the resource requested. It is case-sensitive, though it may be treated as case-insensitive by some servers, especially those based on Microsoft Windows.

If the server is case sensitive and http://en.example.org/wiki/URL is correct, then http://en.example.org/WIKI/URL or http://en.example.org/wiki/url will display an HTTP 404 error page, unless these URLs point to valid resources themselves.

Alford answered 30/5, 2013 at 10:56 Comment(4)
This answer has the only correct wording "it is case-sensitive, though it may be treated as case-insensitive". Only valid answer.Fortyniner
@DanFromGermany, path is case-sensitive can be deduced vaguely from here "URLs in general are case-sensitive (with the exception of machine names).There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy." But, it is ambiguous to deduce that. As mentioned in one above comment, RFC1738 does not discuss if parts of the URL other than scheme should be interpreted as case sensitive or not. Do you have any link which clarifies which parts of url are case-sensitive?Paronomasia
@Paronomasia From RFC3986 6.2.2.1. Case Normalization: When a URI uses components of the generic syntax, the component syntax equivalence rules always apply; namely, that the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI HTTP://www.EXAMPLE.com/ is equivalent to http://www.example.com/. The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme."Fortyniner
@Paronomasia And from the HTTP RFC: "When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs [...]" (with exception of scheme and host itself).Fortyniner
G
18

I am not a fan of bumping old articles but because this was one of the first responses for this particular issue I felt a need to clarify something.

As @Bhavin Shah answer states the domain part of the url is case insensitive, so

http://google.com 

and

http://GOOGLE.COM 

and

http://GoOgLe.CoM 

are all the same but everything after the domain name part is considered case sensitive.

so...

http://GOOGLE.COM/ABOUT

and

http://GOOGLE.COM/about

are different.

Note: I am talking "technically" and not "literally" in a lot of cases, most actually, servers are setup to handle these items the same, but it is possible to set them up so they are NOT handled the same.

Different servers handle this differently and in some cases they Have to be case sensitive. In many cases query string values are encoded (such as Session Ids or Base64 encoded data thats passed as a query string value) These items are case sensitive by their nature so the server has to be case sensitive in handling them.

So to answer the question, "should" servers be case sensitive in grabbing this data, the answer is "yes, most definitely."

Of course not everything needs to be case sensitive but the server should be aware of what that is and how to handle those cases.


@Hart Simha's comment basically says the same thing. I missed it before I posted so I want to give credit where credit is due.

Goodin answered 19/6, 2014 at 19:37 Comment(0)
A
9

Look at the specification here: section 2.7.3 https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p1-messaging-25#page-19

The scheme and host are case-insensitive and normally provided in lowercase; all other components are compared in a case-sensitive manner.

Albemarle answered 12/1, 2016 at 7:14 Comment(0)
G
9

Section 6.2.2.1 of RFC 3986 says that "scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI HTTP://www.EXAMPLE.com/ is equivalent to http://www.example.com/. The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme".

A server might normalize the passed URI internally and serve the same resource for URIs of different case (/about/ and /ABOUT/), making an URI appear case-insensitive to the user.

Governance answered 28/5, 2021 at 12:49 Comment(2)
What characters are valid in schemes? Only letters? Only alphanumeric characters?Lend
Section 2.2 of the RFC 3986 describes reserved characters. Section 2.3 lists the ranges of characters which can be passed without "percent-encoding".Governance
P
3

Consider the following:

https://www.example.com/createuser.php?name=Paul%20McCartney

In this hypothetical example, an HTML form - using the GET method - sends the "name" parameter to a PHP script that creates a new user account.

And the point I'm making with this example is that this GET parameter needs to be case-sensitive to preserve the capitalisation of "McCartney" (or, as another example, to preserve "Walter d'Isney", as there are other ways for names to break the usual capitalisation rules).

It's cases like these which guides the W3C recommendation that scheme and host are case insensitive, but everything after that is potentially case sensitive - and is left up to the server. Forcing case insensitivity by standard would make the above example incapable of preserving the case of user input passed as a GET query parameter.

But what I'd say is that though this is necessarily the letter of the law to accommodate such cases, the spirit of the law is that, where case is irrelevant, behave in a case insensitive way. The standards, though, can't tell you where case is irrelevant because, like the examples I've given, it's a context-dependent thing.

(e.g. an account username is probably best forced to case insensitivity - as "User123" and "user123" being different accounts could prove confusing - even if their real name, as above, is best left case sensitive.)

Sometimes it's relevant, most times it isn't. But it has to be left up to the server / web developer to decide these things - and can't be prescribed by standard - as only at that level could the context be known.

The scheme and host are case insensitive (which shows the standard's preference for case insensitivity, where it can be universally prescribed). The rest is left up to you to decide, as you understand the context better. But, as has been discussed, you probably should, in the spirit of the law, default to case insensitivity unless you have a good reason not to.

Petronel answered 27/10, 2018 at 5:46 Comment(2)
Are query strings treated as part of the location? I believe they are treated as separate entities and not used for location resolution.Kingery
Query strings are separate from location, yes. But the same principles that I've shown there with query parameters can also apply to other parts of the URL. Some CMSes, for example, might purposefully rewrite "/user.php?id=3756" to "/users/PaulMcCartney" for better SEO-friendly human-readable URLs (Wordpress does this, for example). The point is that the standards deliberately back off from prescription over that which is context-dependent. It's left to the server to decide, as the server understands the context, where a universal standard can't.Petronel
E
3

Case Preservation

URLs are case-preserving, between client and server. But portions of URLs may or may not be case-sensitive, depending on the server, for a couple of reasons.

Case Sensitivity

The following bold parts of URLs may be case-sensitive, depending on the site and/or server configuration.

    http:// www. example.com /abc/def.ghi?jkl=mno#pqr

    user @ example.com

Rationale

Case-sensitivity in URLs can have several uses. Mainly:

  1. Native compatibility with case-sensitive filesystems.
  2. More compact data encoding within URLs, such as for serialization, hashing, IDs, permalinks, and URL shorteners.

As a developer, I believe the above can often be handled in better ways, but I also understand there are cases where a situation may not permit this.

For example, imagine an existing product that requires a lot of data placed in the "GET" URL, yet it must be compatible with the maximum URL lengths of all major servers, browsers, and caching/proxy mechanisms. To fit even a moderate-length command string (under 1,024 characters for some older browsers), you'd need to use every unique URL-safe character you could (which is basically what base64url encoding is).

In an Ideal World

Whether or not URLs should be case-sensitive is debatable. I personally believe they should not be, for simplicity (though it may create longer URLs, we have percent-escapes to easily handle cases where we must ensure preservation of exact characters, and there are ways to transfer data other than right in the URL).

Many seem to agree based on the fact that case-insensitive URLs are explicitly enabled for many popular sites and services, in order to increase usability. The most prominent example is the username portion of email addresses. Most email providers will ignore case and sometimes even dots and other symbols (like "[email protected]" being the same as "[email protected]"). Even though email usernames are case-sensitive by default, according to spec.

However, the fact is that despite what I or others might want, this is the state of how things currently work. And while an eventual worldwide transition to a case-insensitive URL standard is certainly possible, it would likely take quite a long time since case-sensitivity is currently used extensively around the web for various purposes.

Best Practices

As far as best practices go, as a user you can reasonably stick to lowercase for most situations and expect things to work. The main exceptions would be URLs that use case-based encoding or document paths with direct filesystem equivalents. However, such complex URLs are typically copy-pasted (or simply clicked) rather than manually typed.

As a web developer, you should consider keeping URLs as case-insensitive as possible. Though there are clearly some difficult-to-avoid situations, depending on context, as noted above.

Emylee answered 2/7, 2019 at 17:53 Comment(0)
S
2

URLs should be case insensitive unless there is a good reason why they are should not be.

This is not mandatory (it is not any part of an RFC) but it makes the communication and storage of URLs far more reliable.

If I have two pages on a website:

http://stackoverflow.com/ABOUT.html

and

http://stackoverflow.com/about.html

How should they differ? Maybe one is written 'shouting style' (caps) - but from an IA point of view, the distinction should never be made by a change in the case of the URL.

Moreover, it is easy to implement this in Apache - just use CheckSpelling On from mod_Speling.

Sabadell answered 20/1, 2014 at 11:1 Comment(0)
T
0

Old question but I stumbled here so why not take a shot at it since the question is seeking various perspective and not a definitive answer.

w3c may have its recommendations - which I care a lot - but want to rethink since the question is here.

Why does w3c consider domain names be case insensitive and leaves anything afterwards case insensitive ?

I am thinking that the rationale is that the domain part of the URL is hand typed by a user. Everything after being hyper text will be resolved by the machine (browser and server in the back).

Machines can handle case insensitivity better than humans (not the technical kind:)).

But the question is just because the machines CAN handle that should it be done that way ?

I mean what are the benefits of naming and accessing a resource sitting at hereIsTheResource vs hereistheresource ?

The lateral is very unreadable than the camel case one which is more readable. Readable to Humans (including the technical kind.)

So here are my points:-

Resource Path falls in the somewhere in the middle of programming structure and being close to an end user behind the browser sometimes.

Your URL (excluding the domain name) should be case insensitive if your users are expected to touch it or type it etc. You should develop your application to AVOID having users type the path as much as possible.

Your URL (excluding the domain name) should be case sensitive if your users would never type it by hand.

Conclusion

Path should be case sensitive. My points are weighing towards the case sensitive paths.

Ticket answered 5/8, 2014 at 15:12 Comment(0)
M
0

URL characters are converted into hex code (if you've ever noticed spaces in URLs being displayed as %20, etc.), and since lower and upper case have different hex values, it makes perfect sense that URLs are most definitely case sensitive. However the spirit of the question seems to be SHOULD that be the standard and I say no, but they are. Its up to the developer/provider to account for this in their code if they want it to work regardless for an end user.

Monofilament answered 19/4, 2016 at 16:12 Comment(1)
this is an interesting one. regular e ASCII characters (which have an upper and lower case) are not actually converted though right? it's only spaces and extended characters that are escaped in the url. Do any extended chars have an upper/lower case modifier?Booze
C
0

I think this and many of the answers around what the spec does or does not say is missing the point of the question.Should they be case sensitive? That's a loaded question really. From a user's point of view, case sensitivity is a pain point, not all know makes a difference. The question of whether URIs should or shouldn't be, depends on the context of the question. For technical flexibility, yes, they should be. For usability, no, they should not be.

Citarella answered 5/7, 2017 at 17:30 Comment(2)
To be fair, any question asking "SHOULD" is inherently opinion-based and could be removed from StackOverflow. (More: stackoverflow.blog/2010/09/29/good-subjective-bad-subjective)Waterman
@Waterman is correct; I'd recommend flagging this question as opinion based.Vermiform
O
0

The case sensitivity of URLs, in general (along with whether they are same or not if they are in different case), needs to be looked at from the following perspectives:

  • Resource Equivalence
  • URL Comparison

From the perspective of resource equivalence it is generally not possible to say two URLs differing by any case (lower case, upper case, sentence case, camel case ... any combination of case) are different from each other unless the resource is retrieved from both the URLs, which in many cases is not practical (RFC 3986, section 6.1, para 1). Therefore where the resource cannot be retrieved, the comparison perspective is used.

However, in case where it is possible to retrieve the resource, the matter gets more (as expected) complicated. By the provisions of RFC 3986, Section 3.3, para 5, as highlighted below

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax

it would appear that no assumption can be made for the rest of a URI/URL beyond it's scheme and authority from generic syntax (inclusive of the sensitivity question).

For scheme and host part of the authority, however, the specification does (charitably) state them to be case insensitive. Refer RFC 3986, section 3.1, para 1 and RFC 3986, section 6.2.2.1, para 2.

Having exhausted this line of inquiry one should look at the comparison perspective to determine whether URI/URLs are to be case sensitive or not.

The first hint to that direction emerges through perusal of the section 6.2.2.1 (above)

The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme

Which is further buoyed by considering RFC 2616, section 3.2.3

When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs

Then, finally, is the enquiry settled and URLs are case sensitive ... (heh!), not quite, the operative words are 'opaque', 'client' and 'comparing'.

Beyond it's syntax, The above RFC don't mention anything about the actual interpretation of the path and query except that it is 'opaque' and it only specifies how (with a SHOULD and not a MUST) a 'client' may 'compare' the URL. It mentions nothing regarding how a server (SHOULD, let alone MUST) interpret the rest of the URL beyond scheme/authority.

Therefore the server has all the latitude to interpret an URL as they please, which they do as highlighted by earlier posts by others.

Outset answered 10/7, 2020 at 14:51 Comment(0)
S
0

Having the official guidelines mentioned, there's an interesting case, where one should consider using the whole url in UPPERCASE: the QR codes.

For instance, https://example.com/ won't fit into a version 1 (21x21) QR code and will requre a larger version 2 (25x25) QR code.

While using the alphanumeric mode allows to stuff HTTPS://EXAMPLE.COM/12345 into the smaller version 1!

Smyth answered 16/10, 2021 at 20:22 Comment(0)
P
-2

the question is should the url be case sensitive?

I see no use, or good practice behind case sensitive URL's. It stupid, it sucks and should be avoided at all times.

Just to back up my opinion, when someone asks what URL, how could you explain what characters of the URL are Upper or Lower case? That's nonsense and should no one ever tell you otherwise.

Padding answered 21/8, 2013 at 14:17 Comment(5)
There is one advantage to URLs being case sensitive. In some websites, where objects are encoded with unique IDs that can be referred to through the URL, the encoding can be something like base64 instead of base36. This allows you to encode exponentially more unique objects in the same number of URL characters. For example, foo.com/000 - foo.com/zzz (case insensitive) could refer to 36^3 unique objects, where as foo.com/000 - foo.com/ZZZ (case sensitive, meaning foo.com/zzz and foo.com/ZZZ are different paths), would refer to 62^3 objects.Zena
This isn't an answer, it's an opinionated comment.Diskin
I back it up with an example. URL's are used by people -see original question-, not computers. It's very hard so see WHY a link isn't working and since almost ALL domains are case insensitive, so should the rest of the URL. The downvotes are for my tone of voice (which is bad), or because technical people tend to chose technical beauty over user experience.Padding
@theTinMan It's an answer to the opinion-evoking question.Waterman
I agree with @HartSimha and since the question asks for opinion: Unless part of the URL route is being used to identify a unique object, please for the love of all that is good on the internet, DO NOT make it case sensitive.Agonic
S
-2

For websites hosted in a Linux server, URL is case sensitive. http://www.google.com/about and http://www.google.com/About will be redirected to different locations. While in a Windows Server, URL is case-insensitive, as in naming a FOLDER and will be redirected to same location.

Shovelhead answered 3/6, 2015 at 13:22 Comment(1)
This is correct, but since it's not possible to distinguish these two the path section that is sent to the server, including parameters, up to an #anchor, which is not sent to the server, should always be considered case-sensitive.Audible
F
-7

It is possible to make noncase sensitive URLs

RewriteEngine on
rewritemap lowercase int:tolower
RewriteCond $1 [A-Z]
RewriteRule ^/(.*)$ /${lowercase:$1} [R=301,L]

Making Google.com..GOOGLE.com etc direct to google.com

Friedman answered 1/7, 2015 at 9:5 Comment(2)
This doesn't answer the questionHick
The question is: "Should URL be case sensitive?" Your answer is: "How to make case insensitive URLs"Tref

© 2022 - 2024 — McMap. All rights reserved.