How to read 'List separator' from OS in Java?
Asked Answered
A

7

8

I am writing a CSV exporter in Java that should respect the user's custom settings, especially the "List separator" to use as a delimiter.

In Windows, one can set this List separator in

Control Panel -> Regional and Language Options -> Regional Options -> Customize

I don't know about the other operating systems, but I'm pretty sure that you can change that on other OSes, too.

What is the best way to get this custom setting from the OS into Java? I am in an Eclipse RCP environment, so I might use RCP-related solutions if there is something available.

Abessive answered 8/5, 2009 at 7:17 Comment(0)
S
6

From comments of this answer:

Reading the OS-specific setting is a need I have to meet.

So what if OSs other than Windows don't have such a setting?

I suggest you read it from registry on Windows (as alluded here): Read/write to Windows Registry using Java. On other platforms just use a good default, and perhaps, at least on Unix, also support configuring it via a custom environment variable (which you document well): How can my java code read OS environment variables?.

My gut feeling that OSs universally do not have a (system-wide or user-specific) "List separator" setting may be wrong, of course, but I doubt that.

Scrip answered 8/5, 2009 at 9:54 Comment(1)
I agree, if other OSs don't have such a setting, the best bet is to fall back to a default value in these cases. In the windows case, I'll use the registry reading option.Abessive
P
6

Without resorting to a platform specific solution I think that the best approach to take is going to be to allow users to specify their preference of list separator within your own application. Either in a preferences panel, a dialog box on export or via an optional command line argument.

Pollaiuolo answered 8/5, 2009 at 8:3 Comment(2)
Yeah; I don't think all OSs have such "List separator" setting. (At least never heard about it on Linux, for example)Scrip
Reading the OS-specific setting is a need I have to meet. Otherwise you are right, that would be a more platform-independent way, definitely.Abessive
S
6

From comments of this answer:

Reading the OS-specific setting is a need I have to meet.

So what if OSs other than Windows don't have such a setting?

I suggest you read it from registry on Windows (as alluded here): Read/write to Windows Registry using Java. On other platforms just use a good default, and perhaps, at least on Unix, also support configuring it via a custom environment variable (which you document well): How can my java code read OS environment variables?.

My gut feeling that OSs universally do not have a (system-wide or user-specific) "List separator" setting may be wrong, of course, but I doubt that.

Scrip answered 8/5, 2009 at 9:54 Comment(1)
I agree, if other OSs don't have such a setting, the best bet is to fall back to a default value in these cases. In the windows case, I'll use the registry reading option.Abessive
F
6

In addition to providing your own option to the user in your application you could try to guess what the list separator is.

I had a look at some locales in Windows and saw that the list separator is either ";" or ",". I've heard there is another character in some obscure locale, but have not seen it myself. So if you can make your code to handle both ";" and "," as list separators then you will probably cover majority of cases.

Also, it looks like when "," is used as a decimal separator, then "," is never used as a list separator. I guess this is otherwise numbers will be impossible to distinguish in a list: 1,2,3,4 could be 1.2, 3.4 or 1, 2.3 In these cases ";" is used as a list separator. Unfortunately the reverse is not true. Arabic has "." as a decimal symbol and ";" as a list separator.

So I think the rule that can be reasonably safely followed is:

if (decimalSeparator == ',') 
    then listSeparator = ';'
else if (decimalSeparator == '.') 
    then listSeparator = new char[] {';', ','}
Fictionist answered 29/11, 2010 at 11:45 Comment(0)
A
2

Out of curiosity, I searched a bit the topic, and indeed Java seems to have not such notion out of the box.

The Locales Demo gives a fairly complete listing of locale settings and there is no list separator there.

I saw a forum question referring to sun.text.resources package, which is private and deprecated. You won't find much other references to this package, looks like it lives in jre/lib/ext/localedata.jar although my recent copy of this one lists mostly Asian locales.

The above advices are sound, or you might research and use a private list per locale. I would look perhaps at IBM's ICU library (C language I think) which seems to have a fairly big list of locale settings. According to a remark, ICU itself gets its information from a an ISO standard, which should be researched as primary information provider.

Avigdor answered 8/5, 2009 at 10:16 Comment(0)
C
1

I faced the same problem and also found out that the only proper solution is to provide a way to the user to specify the "delimiter of choice".

However, if that is not possible, as it was in my case, the closest I could get to cross-platfrom, cross-locale support is the following:

  • guess the separators you need to use using DecimalFormatSymbols
  • add a line at the beginning of the CSV containing, for example "sep=," (without quotes), this should specify the list separator you've just guessed

This will, at least in most cases (the cases I have tested), force Excel to use that delimiter, even if you "guessed wrong" and will provide at least a little bit more compatibility.

Coulee answered 16/12, 2015 at 15:35 Comment(1)
Apache Commons CSV has an issue with reading a file that starts with sep= instead of a valid record.Society
M
1

Well I see this as a serious topic!

Thing is that in Windows, there is the Excel. And many programs have to interact with it. Excel uses the List separator locale (cell separation).

The basic problem is that “Comma Separated Values” do not work outside US (Spanish (USA)) and English because you can’t have the same list separator character as the decimal separator. So the world outside US and English (including US army that (should) follow NATO locales, decimal comma etc and ISO measures) uses ; as list separator to separate cells in spreadsheets, not confusing it with the decimal commas.

So in general, the locale Windows settings for list separator is ; except for English or US where list separator is , .

Excel has its US/English CSV inconveniences, specifically; In the default US and English settings the same character , is thousand separator and Excels goes nuts (cells gets screwed importing CSVs with it used). Skipping thousand separator (when it is the same as the list separator) Excel do not recognise currency cells and take them as text. Something we have to live with in US and English environments, else world wide it works fine (with ; ).

Interesting thing (funny thought) is if Java in Windows supports an NDK with Win32 library access? Because in Win32 you call

GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SLIST, pwListSeparator, 4); 

I can’t find a Java solution getting a list separator?

My suggestion is using hard coded general rule, set the list separator to ; when decimal separator is , else set the list separator to , if another character can't be found in the locale methods or settings like in Win32. Remove (don't use) thousand separators if they are the same as the list separator.

Or use tabbed text files? The only inconvenience is that Windows open by default Excel on a .csv file and Notepad on a .txt file using the ShellExecute(0, 0, pwFileName, 0, 0, SW_SHOW);.

Noticeable is that decimal separator and thousand separator exists in Java and I guess should be used:

import java.text.DecimalFormatSymbols;
new DecimalFormatSymbols().getDecimalSeparator();
new DecimalFormatSymbols().getGroupingSeparator();
Mechanician answered 20/11, 2018 at 3:46 Comment(1)
Noticeable is that Excel can't stand Unicode Character 'NO-BREAK SPACE' (U+00A0), hex \xC2\xA0 and must be replaced with the regular space character (or nothing) in CSV documents. The amazing thing is that Windows reports \xC2\xA0 as the thousand separator for instance in Swedish settings, and Excel can't take it? Does Excel read all the Windows locale settings? Or is it a behaviour in the Win32 UTF16/UTF8 conversion facility MultiByteToWideChar (I have a internal totally UTF8 app, displayed in UTF16 in Win32 and in the Windows files)?Decentralization
W
-1

For windows it's stored in the registry at:

"HKEY_CURRENT_USER\\Control Panel\\International"

so you can use something like this

private void setDelimiterProperties(String delimiter) {
    Properties p = new Properties();
    String key = "HKEY_CURRENT_USER\\Control Panel\\International\\sList";
    p.setProperty(key, delimiter);
}
Widgeon answered 8/5, 2009 at 7:38 Comment(4)
Um, wasn't the question about getting the "custom setting from the OS into Java". (Also, just creating a Properties object and setting it in that surely won't write anything in Windows Registry either)Scrip
ok, so then you use getProperty(key) instead of setProperty(key, value). and you're wrong about not writing it to windows registry. in fact, that's all that needs to be done to write it to registry.Widgeon
About reading from/writing to Windows Registry: #62789Scrip
Hmm, if Properties is java.util.Properties, then I find it impossible to believe your setDelimiterProperties() as it now stands will write to Registry. Maybe you confuse it with java.util.prefs.PreferencesScrip

© 2022 - 2024 — McMap. All rights reserved.