How do I detect the user's locale to get the correct csv separator?
Asked Answered
G

6

10

I have a simple data conversion tool and one of the outputs it can produce is a csv file.

This works perfectly here in the UK but when I shipped it out to a German customer I had some issues. Specifally, they use a ',' to represent the decimal point in a floating point number and vice versa. This means that when they open their data file in excel, the result is rather messy to say the least :-)

Substituting the correct character is trivial, but how can I detect whether or not to apply this?

Edit:

So this:

a,b,c
1.1,1.2,1.3
"1.1",1,2,"1,3"
"this,is,multi-
-line",this should be column 2, row 4
a;b;c
"a;b","c"

..looks like this when loaded into excel in the UK:

+----------------+-----+-----+-----+
| a              | b   | c   |     |
+----------------+-----+-----+-----+
| 1.1            | 1.2 | 1.3 |     |
+----------------+-----+-----+-----+
| 1.1            | 1   | 2   | 1,3 |
+----------------+-----+-----+-----+
| this,is,multi- |     |     |     |
| -line          | 2   | 4   |     |
+----------------+-----+-----+-----+
| a;b;c          |     |     |     |
+----------------+-----+-----+-----+
| a;b            | c   |     |     |
+----------------+-----+-----+-----+

..but what happens in Germany?

Greig answered 11/11, 2011 at 10:34 Comment(8)
the csv separator is not locale-dependent: it is a property of the file format only; like mentioned, you should quote or escape any separators appearing in cell valuesAnguine
Apologies; I wasn't sure how to phrase it. So it's the decimal separator I need to handle correctly? My customer suggested that they use semi-colons as separators for .csv files... ?Greig
Have a look at wikipedia for a reference on 'c'sv files: en.wikipedia.org/wiki/Comma-separated_values ...which suggests in Germany they use the semi-colon... ?Greig
Both formats get used in The Netherlands and Germany - the semicolon format does however not strictly follow the RFC and it is recommended to use commas, and double quote any values that need escaping.Teetotal
Thanks - that looks like the most robust / widely accept approach.Greig
Re your edit: It should be identical in Germany / The Netherlands now as to the British one. (i.e. a file can only have one seperator, and once comma has been set the semicolon will be treated as just another character.Teetotal
You're correct; that example appears exactly the same in German excel. Does that mean then that if the first separator was a ; that would pick up ; as separators instead of ,?Greig
Excel uses the list separtors (cell separation) in the national locale settings of Windows. All but US and English uses ; as list separator and not , (used for decimals). In Win32 you call GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SLIST, pwListSeparator, 4); to get it. Interesting is that the US and English thousand separator is the same (,) and it makes it problems to identify currency amount cells. Actually using the thousand separtor in US and English makes the cells screwed.Aforementioned
T
-5

CSV files as the name suggest should be comma-seperated and are not local dependant. However what you could do to avoid this issue is double-quote the relevant decimal numbers within the CSV file as such: "10,20", "1,50", "This is another column". This should avoid the issue entirely for any decent CSV-parser (such as the FileHelpers library) which will read this as 10,20 and 1,50 and not as: 10, 20, 1, and 50.

See CSV:

More sophisticated CSV implementations permit commas and other special characters in a field value. Many implementations use " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or newlines); embedded double quote characters may be represented by a pair of consecutive double quotes

Teetotal answered 11/11, 2011 at 10:36 Comment(4)
Thanks; wrapping in quotes looks like it works here in the UK so I reckon it should allow comma's in float fields over there. Cheers!Greig
"should be comma-separated and are not local dependent..." – You should tell that Microsoft! Excel uses the "List separator" character defined in the regional & language settings.Doublecheck
Just as druciferre said, Microsoft export CSVs using culture config. Hence seperator is not always ';'.Kus
Well that's nice, have you ever seen microsoft excel on other than english environments? Eg. czech locale? (delimited by ';' by default)Monadnock
V
22

Use:

System.Globalization.CultureInfo.CurrentCulture.TextInfo.ListSeparator

Writing CSV: The "List separator" string should be used as the delimiters in CSV (see below on how to change this variable). Changing the value of the "List separator" is also reflected in Excel when saving as CSV.

Reading CSV: Determining the delimiter in CSV is another story and it is a bit more complex. In principle it is possible to use a "," as a CSV delimiter in one system and use a ";" or even a "*" or any ("string") as a delimiter on another system: This article provides some insights on how to detect CSV delimiters where reading cross-systems CSV files:

http://www.codeproject.com/Articles/231582/Auto-detect-CSV-separator.

Also you can perform some tests on your exporter by changing the "List separator" value in Windows as follows (might differ between with each Windows OS):

  • Open Region and Language dialog.
  • Select on the "Format" tab.
  • Click on "Additional Settings"
  • Edit the value of the "List separator"
Vehement answered 23/11, 2015 at 11:33 Comment(0)
S
9

As others have mentioned CSV in general should be comma-separated and fields should be double-quoted. However there is also MS Excel specific behavior that causes a correct CSV file to be imported incorrectly. That is because MS Excel by default uses list separator set in Windows System in 'Regional and language options'. For US/UK locale it is comma but for such languages as German it is semicolon. So for MS Excel the option is to use different separator per locale.

Selfridge answered 3/12, 2012 at 19:59 Comment(0)
B
6

The CurrencyDecimalSeparator property contains the decimal separator for the given culture. This being said the CSV separator is not culture dependent. It is a property of the CSV file which you indicate to the parser. And talking about parsers I sincerely hope that you are not rolling your own CSV parser.

Brave answered 11/11, 2011 at 10:36 Comment(1)
This is a legacy codebase which I've inherited and it's just the output I'm dealing with rather than parsing it as an input. We actually use lumen works for csv parsing where we need it: codeproject.com/KB/database/CsvReader.aspxGreig
M
1

As others recommended already, the format should not be locale sensitive. This is true for storage (in files like CSV or other formats) or communication protocols. You should worry about locale sensitivity for the presentation layer only. Otherwise it means that a file saved by an American user (for instance) cannot be loaded by a German one (and the other way around).

See here for more complete guidelines: http://mihai-nita.net/2005/10/25/data-internationalization/

Mendiola answered 11/11, 2011 at 11:7 Comment(1)
Yes, but that's how excel work, and we need to be excel compatible. US excel files can't be opened by EU users easily and vice versa.Pokorny
S
1

The way I read this question is that the problem is not with the .csv file. The .csv file is probably formatted identically for all users as is repeated in every answer above.

However, the Excel VBA application is locale sensitive.

When Excel is opened by different users in different countries, they are using it with existing regional settings, occasionally different than the .csv file, and therefore interpretes the data within the .csv file in different ways.

So the question really is, how can we program around it so that our code extracts the current regional separator, and then performs the correct interpretation?

One way is to make a small code snippet that writes a decimal number to a test cell, then reads it back to see what decimal is used, but there should be a more neat way to do it...

Sabo answered 23/3, 2020 at 8:8 Comment(1)
This is something of a helpful partial answer. Please avoid the impression that it is instead a question you add. Answering posts here are expected to only answer the question a the top of this page.Hooked
T
-5

CSV files as the name suggest should be comma-seperated and are not local dependant. However what you could do to avoid this issue is double-quote the relevant decimal numbers within the CSV file as such: "10,20", "1,50", "This is another column". This should avoid the issue entirely for any decent CSV-parser (such as the FileHelpers library) which will read this as 10,20 and 1,50 and not as: 10, 20, 1, and 50.

See CSV:

More sophisticated CSV implementations permit commas and other special characters in a field value. Many implementations use " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or newlines); embedded double quote characters may be represented by a pair of consecutive double quotes

Teetotal answered 11/11, 2011 at 10:36 Comment(4)
Thanks; wrapping in quotes looks like it works here in the UK so I reckon it should allow comma's in float fields over there. Cheers!Greig
"should be comma-separated and are not local dependent..." – You should tell that Microsoft! Excel uses the "List separator" character defined in the regional & language settings.Doublecheck
Just as druciferre said, Microsoft export CSVs using culture config. Hence seperator is not always ';'.Kus
Well that's nice, have you ever seen microsoft excel on other than english environments? Eg. czech locale? (delimited by ';' by default)Monadnock

© 2022 - 2025 — McMap. All rights reserved.