In C#, what is the difference between ToUpper()
and ToUpperInvariant()
?
Can you give an example where the results might be different?
In C#, what is the difference between ToUpper()
and ToUpperInvariant()
?
Can you give an example where the results might be different?
ToUpper
uses the current culture. ToUpperInvariant
uses the invariant culture.
The canonical example is Turkey, where the upper case of "i" isn't "I".
Sample code showing the difference:
using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;
public class Test
{
[STAThread]
static void Main()
{
string invariant = "iii".ToUpperInvariant();
CultureInfo turkey = new CultureInfo("tr-TR");
Thread.CurrentThread.CurrentCulture = turkey;
string cultured = "iii".ToUpper();
Font bigFont = new Font("Arial", 40);
Form f = new Form {
Controls = {
new Label { Text = invariant, Location = new Point(20, 20),
Font = bigFont, AutoSize = true},
new Label { Text = cultured, Location = new Point(20, 100),
Font = bigFont, AutoSize = true }
}
};
Application.Run(f);
}
}
For more on Turkish, see this Turkey Test blog post.
I wouldn't be surprised to hear that there are various other capitalization issues around elided characters etc. This is just one example I know off the top of my head... partly because it bit me years ago in Java, where I was upper-casing a string and comparing it with "MAIL". That didn't work so well in Turkey...
ımage
as a field name for Image
and Unity 3D spamming an internal error to the console Unable to find key name that matches 'rıght'
on an "English" Windows with Turkey regional settings for date and time. Looks like sometimes even Microsoft fails the Turkey test, an PC's language isn't even Turkish, just lol. –
Rooke Jon's answer is perfect. I just wanted to add that ToUpperInvariant
is the same as calling ToUpper(CultureInfo.InvariantCulture)
.
That makes Jon's example a little simpler:
using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;
public class Test
{
[STAThread]
static void Main()
{
string invariant = "iii".ToUpper(CultureInfo.InvariantCulture);
string cultured = "iii".ToUpper(new CultureInfo("tr-TR"));
Application.Run(new Form {
Font = new Font("Times New Roman", 40),
Controls = {
new Label { Text = invariant, Location = new Point(20, 20), AutoSize = true },
new Label { Text = cultured, Location = new Point(20, 100), AutoSize = true },
}
});
}
}
I also used New Times Roman because it's a cooler font.
I also set the Form
's Font
property instead of the two Label
controls because the Font
property is inherited.
And I reduced a few other lines just because I like compact (example, not production) code.
I really had nothing better to do at the moment.
String.ToUpper
and String.ToLower
can give different results given different cultures. The most known example is the Turkish example, for which converting lowercase latin "i" to uppercase, doesn't result in a capitalized latin "I", but in the Turkish "I".
As for me it was confusing even with the above picture (source), I wrote a program (see source code below) to see the exact output for the Turkish example:
# Lowercase letters
Character | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - i (\u0069) | I (\u0049) | I (\u0130) | i (\u0069) | i (\u0069)
Turkish i - ı (\u0131) | ı (\u0131) | I (\u0049) | ı (\u0131) | ı (\u0131)
# Uppercase letters
Character | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - I (\u0049) | I (\u0049) | I (\u0049) | i (\u0069) | ı (\u0131)
Turkish i - I (\u0130) | I (\u0130) | I (\u0130) | I (\u0130) | i (\u0069)
As you can see:
Culture.CultureInvariant
leaves the Turkish characters as isToUpper
and ToLower
are reversible, that is lowercasing a character after uppercasing it, brings it to the original form, as long as for both operations the same culture was used.According to MSDN, for Char.ToUpper
and Char.ToLower
Turkish and Azeri are the only affected cultures because they are the only ones with single-character casing differences. For strings, there might be more cultures affected.
Source code of a console application used to generate the output:
using System;
using System.Globalization;
using System.Linq;
using System.Text;
namespace TurkishI
{
class Program
{
static void Main(string[] args)
{
var englishI = new UnicodeCharacter('\u0069', "English i");
var turkishI = new UnicodeCharacter('\u0131', "Turkish i");
Console.WriteLine("# Lowercase letters");
Console.WriteLine("Character | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
WriteUpperToConsole(englishI);
WriteLowerToConsole(turkishI);
Console.WriteLine("\n# Uppercase letters");
var uppercaseEnglishI = new UnicodeCharacter('\u0049', "English i");
var uppercaseTurkishI = new UnicodeCharacter('\u0130', "Turkish i");
Console.WriteLine("Character | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
WriteLowerToConsole(uppercaseEnglishI);
WriteLowerToConsole(uppercaseTurkishI);
Console.ReadKey();
}
static void WriteUpperToConsole(UnicodeCharacter character)
{
Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
character.Description,
character,
character.UpperInvariant,
character.UpperTurkish,
character.LowerInvariant,
character.LowerTurkish
);
}
static void WriteLowerToConsole(UnicodeCharacter character)
{
Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
character.Description,
character,
character.UpperInvariant,
character.UpperTurkish,
character.LowerInvariant,
character.LowerTurkish
);
}
}
class UnicodeCharacter
{
public static readonly CultureInfo TurkishCulture = new CultureInfo("tr-TR");
public char Character { get; }
public string Description { get; }
public UnicodeCharacter(char character) : this(character, string.Empty) { }
public UnicodeCharacter(char character, string description)
{
if (description == null) {
throw new ArgumentNullException(nameof(description));
}
Character = character;
Description = description;
}
public string EscapeSequence => ToUnicodeEscapeSequence(Character);
public UnicodeCharacter LowerInvariant => new UnicodeCharacter(Char.ToLowerInvariant(Character));
public UnicodeCharacter UpperInvariant => new UnicodeCharacter(Char.ToUpperInvariant(Character));
public UnicodeCharacter LowerTurkish => new UnicodeCharacter(Char.ToLower(Character, TurkishCulture));
public UnicodeCharacter UpperTurkish => new UnicodeCharacter(Char.ToUpper(Character, TurkishCulture));
private static string ToUnicodeEscapeSequence(char character)
{
var bytes = Encoding.Unicode.GetBytes(new[] {character});
var prefix = bytes.Length == 4 ? @"\U" : @"\u";
var hex = BitConverter.ToString(bytes.Reverse().ToArray()).Replace("-", string.Empty);
return $"{prefix}{hex}";
}
public override string ToString()
{
return $"{Character} ({EscapeSequence})";
}
}
}
Start with MSDN
http://msdn.microsoft.com/en-us/library/system.string.toupperinvariant.aspx
The ToUpperInvariant method is equivalent to ToUpper(CultureInfo.InvariantCulture)
Just because a capital i is 'I' in English, doesn't always make it so.
ToUpperInvariant
uses the rules from the invariant culture
there is no difference in english. only in turkish culture a difference can be found.
© 2022 - 2024 — McMap. All rights reserved.