Why is ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ the native name of the U.S.?
Asked Answered
D

2

21

When I use this code:

var ri = new RegionInfo("us");
var nativeName = ri.NativeName;   // ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ

why is nativeName then the string "ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ" (in Cherokee)?

If I change to new RegionInfo("US") (only difference, capital US), I get instead "United States".

I do know the preferred usage of RegionInfo is to give a specific culture info string such as:

new RegionInfo("en-US")
new RegionInfo("chr-Cher-US")

and so on, and that works. But why is Cherokee preferred over English only if I use lower-case us?


(Seen on Windows 10 (version 1803 "April 2018 Update"), .NET Framework 4.7.2.)


Update: This is not consistent, even on the same machine. For example I tried opening PowerShell very many times, each time pasting [System.Globalization.RegionInfo]'US' into it. It seems like for a long period, all instances of PowerShell are consistently giving the same result. But then after a while, the instances of PowerShell then give the opposite result. Here is a screenshot of two of the windows, one consistently having one NativeName, and the other one consistently having the opposite one. So there must be some non-deterministic determination going on (no difference in casing):

PowerShell windows

Dermatology answered 13/11, 2018 at 16:31 Comment(12)
Could be a bug. The documentation says "Case is not significant." Of course, it also says, "You should provide the name of a specific culture rather than just a country/region name in the name parameter."Cremona
Even with US I get ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ in LinqpadPerigordian
To be fair, the Cherokee were in the US before it was the US :).Cremona
From the docs: We recommend that you use the culture name ... Therefore, creating the RegionInfo object with only a country/region name of US is not specific enough to distinguish the appropriate string.Krueger
@DavidG: Yeah, so is the fact that it's able to determine the appropriate region with uppercase US an accident? The main documentation of RegionInfo makes it clear that uppercase US works correctly.Interlaken
Looks like .NET just delegates to the OS, so it's a Windows 10 thing.Cremona
@Interlaken AFAICT the docs are pretty explicit to avoid the two-letter code. The RegionInfo class, the NativeName property and the constructor. Is there another doc that needs editing?Krueger
@DavidG: Those documents don't suggest anything like the behavior being undefined, unpredictable or unsupported when a two-letter code is provided. They're just advising developers to provide the culture name for best results and handwaving it otherwise. If this behavior is intentional or otherwise not a bug, there must be a reason for it...Interlaken
@PanagiotisKanavos It seems quite random. On another machine, when in PowerShell I do [System.Globalization.RegionInfo]'us' and [System.Globalization.RegionInfo]'US', it is opposite of what you report, United States in both cases.Dermatology
I'm going to go with "easter egg."Imprudent
This question should be migrated to history.stackexchange.com . (Or... wait.. what?)Hispanic
@JeppeStigNielsen I updated my answer to add info about the caching it uses, which seems to affect the consistency.Flak
F
8

The first thing to note is that the constructor for RegionInfo finds the region by finding a culture used in that region. So it's looking for a language in that country, not just the country.

Reading through that source code, it seems like the difference in upper/lower case is because of how the lookups are done if no culture is specified with the region.

For example, it tries a couple things first, but then it will try to look in a static list of regions. But because it's using Dictionary.ContainsKey, it's a case-sensitive search. So if you specify "US", it will find it, but not "us".

Later, it searches through all the cultures (from CultureInfo.GetCultures(CultureTypes.SpecificCultures)) for the region you gave, but it does so in a case-insensitive way.

I can't confirm since I can't step through that code, but my guess is that, because it's going through the list in order, it will get to chr-Cher-US before it gets to en-US.

Why is it not consistent?

One of the comments said that LinqPad finds Cherokee even when using upper case. I don't know why this is. I was able to replicate that, but I also found that in Visual Studio, it's English when using "US" and Cherokee when using "us", like you describe. But I did find that if I turn on "Use experimental Roslyn assemblies" in LinqPad, then it returns English for both "US" and "us". So maybe it has something to do with the exact runtime version targetted, I can't say for sure.

One thing that affects consistency is caching: the first thing that it will do when it does not get a complete match by culture + region is check a cache of already-found cultures. It lower-cases all the keys in that cache, so this cache is case-insensitive.

You can test this. We know that using "US" vs. "us" will yield different results, but try this in the same program:

var nativeNameus = new RegionInfo("us").NativeName;
var nativeNameUS = new RegionInfo("US").NativeName;

Then swap them and run it again:

var nativeNameUS = new RegionInfo("US").NativeName;
var nativeNameus = new RegionInfo("us").NativeName;

Both results will always be equal because the first culture is cached and used for the next.

It's possible that there is code outside of your code that calls the same methods and ends up caching a culture value, thereby changing the result you get when you do the same.

Conclusion

All that said, the docs actually say:

We recommend that you use the culture name—for example, "en-US" for English (United States)—to access the NativeName property.

So it is a bit of a moot point: you asked for a region, not a language. If you need a specific language, ask for that language, not just a region.

If you want to guarantee English, then either:

  1. Do as Microsoft recommends and specify the language with the region: "en-US", or
  2. Use the EnglishName or DisplayName properties (which are English even when the NativeName is Cherokee).
Flak answered 13/11, 2018 at 16:53 Comment(9)
The reason why some people get ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ and some United States is most probably, because they target different .NET Framework versions. If you compile OP code against .NET Framework 3.5 or lower it will print United States. "chr-Cher-US" was probably added in later versions of .NET Framework and that's why "en-US" is found first in dictionary.Article
This comment in the RegionInfo constructor says it all Note: We prefer that a region be created with a full culture name (ie: en-US) because otherwise the native strings won't be right.Plagiary
@Article It is apparently very device dependent. On my machine I'm reliably getting only "United States" for both us and US on FW3.5, and only Cherokee for both us and US with FW4.0 and up.Phoneme
@Phoneme Why do you say it's device dependent? Changing the framework version would make it framework version dependent.Levey
@Amy Because I'm getting one result with e.g. FW4.5 and other people are getting a different result also with FW4.5. I'm not saying it is necessarily device dependent (as in, hardware), but it is evidently not limited to just .NET version.Phoneme
Sure, but ALL people involved are just using the wrong parameters.... A culture information is always in the format "en-us" or "fr-ca" or "sp-mx" etc... Just specifying a country leaves it up to the OS to decide what to spit back, depending on how the list of cultures for the selected country are sorted internally. The fact that it worked "reliably" before was just dumb luck that the result you were looking for was on top. Use culture AND country to avoid issues. If you just want a region-generic version, use the language code alone (for example "en" instead of "en-us").Barrows
I was giving this some extra thought (because I'm like that) and remembered I saw some caching going on in the code. So I did some tests and it does indeed affect the consistency of this. I've updated my answer. But my conclusion still stands: if you need a specific language, ask for that language.Flak
@bastos.sergio's comment is the only one that provides an actual lead. So it looks like the source does acknowledge that the value of NativeName is unsupported when only a region identifier is provided.Interlaken
So this method should probably throw exception when only region is specified instead of returning "random" values. Looks like bad implementation to meArticle
C
0

I was having the same issue and had to make some changes to keep from getting an "Unknown Region (US)" error string when playing with the languages. This is for a course project and had to do some trial and error to get it to work for my needs, but it seems to work out this way. The task requirements restrict me to Visual Studio 2022, a Forms .NET Framework project using .NET 4.7.2.

public String GetLocation(){
/* Reading from CurrentCulture only returns what is set in the language, not region
If it's Spanish(United States) it reads US, or Spanish(Mexico) reads as Mexico
even if you have region set as United Arab Emirates, so this is only useful for getting
the 2 digit language code */
    String langCode = Thread.CurrentThread.CurrentCulture.TwoLetterISOLanguageName;

// Create a string variable to store region info
    string regionString;

// Access the user's Registry to read the Name of the Nation from region settings
    var regKeyGeoId = Microsoft.Win32.Registry.CurrentUser.OpenSubKey(@"Control Panel\International\Geo");
    var nationName = regKeyGeoId.GetValue("Name");


    RegionInfo usersRegion = new RegionInfo(langCode + "-" + nationName);
    String regionCode = usersRegion.TwoLetterISORegionName;


// Store the Region either as Dislplay Name (Returns as systems language)
// or Native Name (Returns as language from region settings)
// For my purposes, I prefer in the native language

    regionString = "Region: " + usersRegion.NativeName;

// Sometimes, it doesn't recognize the region for the native language,
// so, this statement catches that error and returns the Display Name

    if (usersRegion.NativeName.Equals("Unknown Region (" + regionCode + ")"))
    {
        regionString = "Region: " + usersRegion.DisplayName;

    // The hard coded strings may make some of this irrelevant if it's used in another language,
    // as "Unknown Region" might not be what is returned in Arabic, etc.
    // I will fix that with Globalization techniques, storing strings in a resources files instead
    /*
    An example of the error is if you set language as Spanish(Cuba)
    and region as United Kingdom.
    A user might do something like this because of differences in dialect
    so this catches that issue
    */
    }

    return regionString;}

I think this is the first time I've contributed to a question, so hopefully I've done this right.

Cagle answered 19/4 at 0:45 Comment(2)
Your post constructs a region info from a language code obtained from one source (current culture of the thread) combined with a country code obtained from another source (some place in the Windows Registry). But this does not constitute an answer to my precise question. My original question was, "why was it unpredictable and inconsistent what language you got when you only gave the country code?". I was not asking where I could obtain a language code.Dermatology
Another comment is, if you want a region info instance that represents the current OS settings, with respect to both language and country, you can try the static RegionInfo.CurrentRegion property.Dermatology

© 2022 - 2024 — McMap. All rights reserved.