Dealing with commas in a CSV file
Asked Answered
L

29

533

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.

Lippold answered 20/4, 2009 at 18:52 Comment(5)
the customer is writing it and uploading itLippold
on iOS, essentially you must use github.com/Flinesoft/CSVImporterSuppurate
I have the exact same issue, trying to total a column in a csv file which is comma separated. No problem with an awk command. Unfortunately some cells may contain commas (in an address for example), other cells won't. Looking for a Linux compatible solution but not sure where to start.Anthemion
First, get the customer to follow RFC-4180 and enclose in double quotes the fields that contain commas or newline characters. If you are building your own data processing pipeline, you can look at github.com/dbro/csvquote which can let tools like awk, cut, sort, wc, etc. properly ignore the embedded commas and newlines.Hoogh
Use kit designed to handle CSV data. Don't try to reinvent the wheel on your own. In PowerShell, Import-Csv handles that and likely conforms to RFC 4180 as suggested by @dbro. Export-Csv allows you to choose when QUOTATION MARK characters (often called "double quotes") are used.Exertion
F
237

As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.

By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.

You can use it as follows:

using System;
public class test
{
    public static void Main()
    {
        using ( CsvReader reader = new CsvReader( "data.csv" ) )
        {
            foreach( string[] values in reader.RowEnumerator )
            {
                Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
            }
        }
        Console.ReadLine();
    }
}

Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.

using System.IO;
using System.Text.RegularExpressions;

public sealed class CsvReader : System.IDisposable
{
    public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
    {
    }

    public CsvReader( Stream stream )
    {
        __reader = new StreamReader( stream );
    }

    public System.Collections.IEnumerable RowEnumerator
    {
        get {
            if ( null == __reader )
                throw new System.ApplicationException( "I can't start reading without CSV input." );

            __rowno = 0;
            string sLine;
            string sNextLine;

            while ( null != ( sLine = __reader.ReadLine() ) )
            {
                while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
                    sLine += "\n" + sNextLine;

                __rowno++;
                string[] values = rexCsvSplitter.Split( sLine );

                for ( int i = 0; i < values.Length; i++ )
                    values[i] = Csv.Unescape( values[i] );

                yield return values;
            }

            __reader.Close();
        }
    }

    public long RowIndex { get { return __rowno; } }

    public void Dispose()
    {
        if ( null != __reader ) __reader.Dispose();
    }

    //============================================


    private long __rowno = 0;
    private TextReader __reader;
    private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
    private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}

public static class Csv
{
    public static string Escape( string s )
    {
        if ( s.Contains( QUOTE ) )
            s = s.Replace( QUOTE, ESCAPED_QUOTE );

        if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
            s = QUOTE + s + QUOTE;

        return s;
    }

    public static string Unescape( string s )
    {
        if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
        {
            s = s.Substring( 1, s.Length - 2 );

            if ( s.Contains( ESCAPED_QUOTE ) )
                s = s.Replace( ESCAPED_QUOTE, QUOTE );
        }

        return s;
    }


    private const string QUOTE = "\"";
    private const string ESCAPED_QUOTE = "\"\"";
    private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}
Forestforestage answered 20/4, 2009 at 19:14 Comment(8)
You may also need to translate \r\n for windows compliance, depending on your application.Whiskey
@NadaNaeem, care to elaborate?Forestforestage
it is not counting the items in a csv file row correctly, it is not dealing well with the commas in fields and courage returns and tapsComorin
Your solution works good except in some rare situations, I've provided an answer with detail as to why.Norris
.Net actually already contains a csv reader, namely TextFieldParser in the Microsoft.VisualBasic.FileIO namespace.Holofernes
Can someone explain why the character class [^""] includes two double quotes? A character class/negated character class consumes 1 character. Specifying the same character twice within the brackets is redundant right?Autosuggestion
@MinhTran - "" is used to escape a double-quote within a C# verbatim string.Triploid
This code works perfectly if the CSV file is delimited by "commas" and has some data that has embedded commas in it. But, what if the file you pick to apply this happens to be tab-delimited? Of course, in this case, you probably would just use a "split ( chr(9) )" but what if you don't know that ahead of time?Orose
C
453

There's actually a spec for CSV format, RFC 4180 and how to handle commas:

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.

http://tools.ietf.org/html/rfc4180

So, to have values foo and bar,baz, you do this:

foo,"bar,baz"

Another important requirement to consider (also from the spec):

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

"aaa","b""bb","ccc"
Centrosome answered 20/4, 2009 at 19:6 Comment(2)
Not really a spec, but still probably handy. It says... "There is no formal specification in existence, which allows for a wide variety of interpretations of CSV files. This section documents the format that seems to be followed by most implementations."Incompletion
Also, don't forget that, despite its name, CSV values in the row may be separated not only by commas - at least on Windows platforms. It depends on the current regional settings (intl.cpl in command line, "Advanced settings"), in particular, list separator: System.Globalization.CultureInfo.CurrentCulture.TextInfo.ListSeparator.Cruiser
F
237

As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.

By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.

You can use it as follows:

using System;
public class test
{
    public static void Main()
    {
        using ( CsvReader reader = new CsvReader( "data.csv" ) )
        {
            foreach( string[] values in reader.RowEnumerator )
            {
                Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
            }
        }
        Console.ReadLine();
    }
}

Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.

using System.IO;
using System.Text.RegularExpressions;

public sealed class CsvReader : System.IDisposable
{
    public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
    {
    }

    public CsvReader( Stream stream )
    {
        __reader = new StreamReader( stream );
    }

    public System.Collections.IEnumerable RowEnumerator
    {
        get {
            if ( null == __reader )
                throw new System.ApplicationException( "I can't start reading without CSV input." );

            __rowno = 0;
            string sLine;
            string sNextLine;

            while ( null != ( sLine = __reader.ReadLine() ) )
            {
                while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
                    sLine += "\n" + sNextLine;

                __rowno++;
                string[] values = rexCsvSplitter.Split( sLine );

                for ( int i = 0; i < values.Length; i++ )
                    values[i] = Csv.Unescape( values[i] );

                yield return values;
            }

            __reader.Close();
        }
    }

    public long RowIndex { get { return __rowno; } }

    public void Dispose()
    {
        if ( null != __reader ) __reader.Dispose();
    }

    //============================================


    private long __rowno = 0;
    private TextReader __reader;
    private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
    private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}

public static class Csv
{
    public static string Escape( string s )
    {
        if ( s.Contains( QUOTE ) )
            s = s.Replace( QUOTE, ESCAPED_QUOTE );

        if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
            s = QUOTE + s + QUOTE;

        return s;
    }

    public static string Unescape( string s )
    {
        if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
        {
            s = s.Substring( 1, s.Length - 2 );

            if ( s.Contains( ESCAPED_QUOTE ) )
                s = s.Replace( ESCAPED_QUOTE, QUOTE );
        }

        return s;
    }


    private const string QUOTE = "\"";
    private const string ESCAPED_QUOTE = "\"\"";
    private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}
Forestforestage answered 20/4, 2009 at 19:14 Comment(8)
You may also need to translate \r\n for windows compliance, depending on your application.Whiskey
@NadaNaeem, care to elaborate?Forestforestage
it is not counting the items in a csv file row correctly, it is not dealing well with the commas in fields and courage returns and tapsComorin
Your solution works good except in some rare situations, I've provided an answer with detail as to why.Norris
.Net actually already contains a csv reader, namely TextFieldParser in the Microsoft.VisualBasic.FileIO namespace.Holofernes
Can someone explain why the character class [^""] includes two double quotes? A character class/negated character class consumes 1 character. Specifying the same character twice within the brackets is redundant right?Autosuggestion
@MinhTran - "" is used to escape a double-quote within a C# verbatim string.Triploid
This code works perfectly if the CSV file is delimited by "commas" and has some data that has embedded commas in it. But, what if the file you pick to apply this happens to be tab-delimited? Of course, in this case, you probably would just use a "split ( chr(9) )" but what if you don't know that ahead of time?Orose
I
81

The CSV format uses commas to separate values, values which contain carriage returns, linefeeds, commas, or double quotes are surrounded by double-quotes. Values that contain double quotes are quoted and each literal quote is escaped by an immediately preceding quote: For example, the 3 values:

test
list, of, items
"go" he said

would be encoded as:

test
"list, of, items"
"""go"" he said"

Any field can be quoted but only fields that contain commas, CR/NL, or quotes must be quoted.

There is no real standard for the CSV format, but almost all applications follow the conventions documented here. The RFC that was mentioned elsewhere is not a standard for CSV, it is an RFC for using CSV within MIME and contains some unconventional and unnecessary limitations that make it useless outside of MIME.

A gotcha that many CSV modules I have seen don't accommodate is the fact that multiple lines can be encoded in a single field which means you can't assume that each line is a separate record, you either need to not allow newlines in your data or be prepared to handle this.

Intransigence answered 20/4, 2009 at 19:46 Comment(0)
M
42

Put double quotes around strings. That is generally what Excel does.

Ala Eli,

you escape a double quote as two double quotes. E.g. "test1","foo""bar","test2"

Metasomatism answered 20/4, 2009 at 18:53 Comment(3)
basically the same concept as a quoted IdentifiersLippold
you escape a double quote as two double quotes. E.g. "test1","foo""bar","test2"Thigmotropism
Just putting double quotes around the string doesn't work when a " is followed immediately by a commaMohammed
P
12

You can put double quotes around the fields. I don't like this approach, as it adds another special character (the double quote). Just define an escape character (usually backslash) and use it wherever you need to escape something:

data,more data,more data\, even,yet more

You don't have to try to match quotes, and you have fewer exceptions to parse. This simplifies your code, too.

Peyter answered 20/4, 2009 at 19:34 Comment(2)
Quick and dirty but doesn't work if you actually have an entry that contains "\,"Thunderbolt
Sarp, that's why a double \\ is an escaped backslash, since that now becomes another special character.Spyglass
G
9

There is a library available through nuget for dealing with pretty much any well formed CSV (.net) - CsvHelper

Example to map to a class:

var csv = new CsvReader( textReader );
var records = csv.GetRecords<MyClass>();

Example to read individual fields:

var csv = new CsvReader( textReader );
while( csv.Read() )
{
    var intField = csv.GetField<int>( 0 );
    var stringField = csv.GetField<string>( 1 );
    var boolField = csv.GetField<bool>( "HeaderName" );
}

Letting the client drive the file format:
, is the standard field delimiter, " is the standard value used to escape fields that contain a delimiter, quote, or line ending.

To use (for example) # for fields and ' for escaping:

var csv = new CsvReader( textReader );
csv.Configuration.Delimiter = "#";
csv.Configuration.Quote = ''';
// read the file however meets your needs

More Documentation

Gilberte answered 20/9, 2012 at 16:34 Comment(1)
It'd be preferable if you included an example of how to use the CsvHelper library to solve the OP's problem.Electrophone
G
7

In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in " as RFC4180 Section 2 proposes:

sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile

Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.

enter image description here

In combination with the --in-place-option you can apply these changes directly to the file.

In order to "build" the right regex, there's a simple principle to follow:

  1. For every field in your CSV that comes before the field with the unwanted comma(s) you write one [^,]*, and put them all together in a capturing group.
  2. For the field that contains the unwanted comma(s) you write (.*).
  3. For every field after the field with the unwanted comma(s) you write one ,.* and put them all together in a capturing group.

Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3.

([^,]*)(,.*)                     #first field, regex
"\1"\2                           #first field, substitution

(.*,)([^,]*)                     #last field, regex
\1"\2"                           #last field, substitution


([^,]*,)(.*)(,.*,.*,.*)          #second field (out of five fields)
([^,]*,[^,]*,)(.*)(,.*)          #third field (out of four fields)
([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)

If you want to remove the unwanted comma(s) with sed instead of enclosing them with quotation marks refer to this answer.

Genesisgenet answered 24/1, 2014 at 9:39 Comment(0)
N
6

As mentioned in my comment to harpo's answer, his solution is good and works in most cases, however in some scenarios when commas as directly adjacent to each other it fails to split on the commas.

This is because of the Regex string behaving unexpectedly as a vertabim string. In order to get this behave correct, all " characters in the regex string need to be escaped manually without using the vertabim escape.

Ie. The regex should be this using manual escapes:

",(?=(?:[^\"\"]*\"\"[^\"\"]*\"\")*(?![^\"\"]*\"\"))"

which translates into ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"

When using a vertabim string @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" it behaves as the following as you can see if you debug the regex:

",(?=(?:[^"]*"[^"]*")*(?![^"]*"))"

So in summary, I recommend harpo's solution, but watch out for this little gotcha!

I've included into the CsvReader a little optional failsafe to notify you if this error occurs (if you have a pre-known number of columns):

if (_expectedDataLength > 0 && values.Length != _expectedDataLength) 
throw new DataLengthException(string.Format("Expected {0} columns when splitting csv, got {1}", _expectedDataLength, values.Length));

This can be injected via the constructor:

public CsvReader(string fileName, int expectedDataLength = 0) : this(new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
    _expectedDataLength = expectedDataLength;
}
Norris answered 29/11, 2016 at 21:48 Comment(2)
How would you go about handling the Header row? i am trying to map the csv to C# objects that are all types, but the header row breaks it because its all strings...Octofoil
Isn't [^""] the same as [^"]? Duplication of a character inside a character class specification is redundant, right?Autosuggestion
T
5

Add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL).

Use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file Here is the sample code:

 Dim parser As TextFieldParser = New TextFieldParser("C:\mar0112.csv")
 parser.TextFieldType = FieldType.Delimited
 parser.SetDelimiters(",")      

   While Not parser.EndOfData         
      'Processing row             
      Dim fields() As String = parser.ReadFields         
      For Each field As String In fields             
         'TODO: Process field                   

      Next      
      parser.Close()
   End While 
Tutti answered 11/4, 2012 at 19:29 Comment(1)
Yes, this is a very useful class in a somewhat unfortunate namespace ;-). To address the original question, however, you should also be setting parser.HasFieldsEnclosedInQuotes = true; and the input file would need to enclose fields that contain commas in quotes as per the CSV spec -- excel does this already.Unbar
S
5

You can use alternative "delimiters" like ";" or "|" but simplest might just be quoting which is supported by most (decent) CSV libraries and most decent spreadsheets.

For more on CSV delimiters and a spec for a standard format for describing delimiters and quoting see this webpage

Sansculotte answered 29/9, 2013 at 12:38 Comment(0)
A
5

In Europe we have this problem must earlier than this question. In Europe we use all a comma for a decimal point. See this numbers below:

| American      | Europe        |
| ------------- | ------------- |
| 0.5           | 0,5           |
| 3.14159265359 | 3,14159265359 |
| 17.54         | 17,54         |
| 175,186.15    | 175.186,15    |

So it isn't possible to use the comma separator for CSV files. Because of that reason, the CSV files in Europe are separated by a semicolon (;).

Programs like Microsoft Excel can read files with a semicolon and it's possible to switch from separator. You could even use a tab (\t) as separator. See this answer from Supper User.

Arundinaceous answered 10/5, 2017 at 14:3 Comment(0)
H
5

Here's a neat little workaround:

You can use a Greek Lower Numeral Sign instead (U+0375)

It looks like this ͵

Using this method saves you a lot of resources too...

Homoio answered 18/12, 2019 at 17:51 Comment(0)
N
4

If you're interested in a more educational exercise on how to parse files in general (using CSV as an example), you may check out this article by Julian Bucknall. I like the article because it breaks things down into much smaller problems that are much less insurmountable. You first create a grammar, and once you have a good grammar, it's a relatively easy and methodical process to convert the grammar into code.

The article uses C# and has a link at the bottom to download the code.

Narvaez answered 13/12, 2013 at 16:29 Comment(0)
E
4

If you feel like reinventing the wheel, the following may work for you:

public static IEnumerable<string> SplitCSV(string line)
{
    var s = new StringBuilder();
    bool escaped = false, inQuotes = false;
    foreach (char c in line)
    {
        if (c == ',' && !inQuotes)
        {
            yield return s.ToString();
            s.Clear();
        }
        else if (c == '\\' && !escaped)
        {
            escaped = true;
        }
        else if (c == '"' && !escaped)
        {
            inQuotes = !inQuotes;
        }
        else
        {
            escaped = false;
            s.Append(c);
        }
    }
    yield return s.ToString();
}
Escrow answered 1/5, 2014 at 17:27 Comment(0)
G
4

I know it's almost 13 years later, but we came across a similar situation where the client inputs us a CSV and has values with commas, there are 2 use cases:

  1. If the client uses a windows Excel client to write the CSV (usually that's the case in windows environment) then commas are automatically added to the value. enter image description here

The actual text value of the CSV:

3786962,1st Meridian Care Services,John,"Person A,Person B, Person C, Person D",Voyager
  1. If the client is sending you the excel programmatically, then he should adhere to RFC4180 and enclose the value with "quotes". example:

Col1, Col2, "a, b, c", Col4

Gymnasiarch answered 2/3, 2022 at 9:46 Comment(0)
C
2

Just use SoftCircuits.CsvParser on NuGet. It will handle all those details for you and efficiently handles very large files. And, if needed, it can even import/export objects by mapping columns to object properties. In addition, my testing showed it averages nearly 4 times faster than the popular CsvHelper.

Como answered 11/2, 2020 at 4:51 Comment(0)
D
1

You can read the csv file like this.

this makes use of splits and takes care of spaces.

ArrayList List = new ArrayList();
static ServerSocket Server;
static Socket socket;
static ArrayList<Object> list = new ArrayList<Object>();


public static void ReadFromXcel() throws FileNotFoundException
{   
    File f = new File("Book.csv");
    Scanner in = new Scanner(f);
    int count  =0;
    String[] date;
    String[] name;
    String[] Temp = new String[10];
    String[] Temp2 = new String[10];
    String[] numbers;
    ArrayList<String[]> List = new ArrayList<String[]>();
    HashMap m = new HashMap();

         in.nextLine();
         date = in.nextLine().split(",");
         name = in.nextLine().split(",");
         numbers = in.nextLine().split(",");
         while(in.hasNext())
         {
             String[] one = in.nextLine().split(",");
             List.add(one);
         }
         int xount = 0;
         //Making sure the lines don't start with a blank
         for(int y = 0; y<= date.length-1; y++)
         {
             if(!date[y].equals(""))
             {   
                 Temp[xount] = date[y];
                 Temp2[xount] = name[y];
                 xount++;
             }
         }

         date = Temp;
         name =Temp2;
         int counter = 0;
         while(counter < List.size())
         {
             String[] list = List.get(counter);
             String sNo = list[0];
             String Surname = list[1];
             String Name = list[2];
             for(int x = 3; x < list.length; x++)
             {           
                 m.put(numbers[x], list[x]);
             }
            Object newOne = new newOne(sNo, Name, Surname, m, false);
             StudentList.add(s);
             System.out.println(s.sNo);
             counter++;
         }
Dastardly answered 21/11, 2013 at 5:50 Comment(0)
R
1

I generally URL-encode the fields which can have any commas or any special chars. And then decode it when it is being used/displayed in any visual medium.

(commas becomes %2C)

Every language should have methods to URL-encode and decode strings.

e.g., in java

URLEncoder.encode(myString,"UTF-8"); //to encode
URLDecoder.decode(myEncodedstring, "UTF-8"); //to decode

I know this is a very general solution and it might not be ideal for situation where user wants to view content of csv file, manually.

Roselba answered 13/8, 2015 at 17:32 Comment(0)
M
1

I usually do this in my CSV files parsing routines. Assume that 'line' variable is one line within a CSV file and all of the columns' values are enclosed in double quotes. After the below two lines execute, you will get CSV columns in the 'values' collection.

// The below two lines will split the columns as well as trim the DBOULE QUOTES around values but NOT within them
    string trimmedLine = line.Trim(new char[] { '\"' });
    List<string> values = trimmedLine.Split(new string[] { "\",\"" }, StringSplitOptions.None).ToList();
Mardis answered 10/2, 2016 at 13:20 Comment(2)
Why my code is never displayed in multiple colors on StackOverflow? I indent by four spaces.Mardis
FYI... There is no programming language tag on the question, so the highlighter doesn't know what language to highlight as.Holofernes
M
1

The simplest solution I've found is the one LibreOffice uses:

  1. Replace all literal " by
  2. Put double quotes around your string

You can also use the one that Excel uses:

  1. Replace all literal " by ""
  2. Put double quotes around your string

Notice other people recommended to do only step 2 above, but that doesn't work with lines where a " is followed by a ,, like in a CSV where you want to have a single column with the string hello",world, as the CSV would read:

"hello",world"

Which is interpreted as a row with two columns: hello and world"

Mohammed answered 18/11, 2017 at 0:2 Comment(1)
By standard rules, any field containing either the split character or the quote is surrounded by quotes, and any quotes inside that are doubled, so there is no issue. Your hello",world field would simply need to be saved as "hello"",world", which can be parsed 100% correctly.Holofernes
F
1
    public static IEnumerable<string> LineSplitter(this string line, char 
         separator, char skip = '"')
    {
        var fieldStart = 0;
        for (var i = 0; i < line.Length; i++)
        {
            if (line[i] == separator)
            {
                yield return line.Substring(fieldStart, i - fieldStart);
                fieldStart = i + 1;
            }
            else if (i == line.Length - 1)
            {
                yield return line.Substring(fieldStart, i - fieldStart + 1);
                fieldStart = i + 1;
            }

            if (line[i] == '"')
                for (i++; i < line.Length && line[i] != skip; i++) { }
        }

        if (line[line.Length - 1] == separator)
        {
            yield return string.Empty;
        }
    }
Fiction answered 6/6, 2018 at 14:38 Comment(0)
C
1

I used Csvreader library but by using that I got data by exploding from comma(,) in column value.

So If you want to insert CSV file data which contains comma(,) in most of the columns values, you can use below function. Author link => https://gist.github.com/jaywilliams/385876

function csv_to_array($filename='', $delimiter=',')
{
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $header = NULL;
    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
        {
            if(!$header)
                $header = $row;
            else
                $data[] = array_combine($header, $row);
        }
        fclose($handle);
    }
    return $data;
}
Choirboy answered 13/11, 2018 at 18:41 Comment(0)
C
1

I used papaParse library to have the CSV file parsed and have the key-value pairs(key/header/first row of CSV file-value).

here is example that I use:

https://codesandbox.io/embed/llqmrp96pm

it has dummy.csv file in there to have the CSV parsing demo.

I've used it within reactJS though it is easy and simple to replicate in app written with any language.

Clerissa answered 24/4, 2019 at 3:28 Comment(0)
R
1

An example might help to show how commas can be displayed in a .csv file. Create a simple text file as follows:

Save this text file as a text file with suffix ".csv" and open it with Excel 2000 from Windows 10.

aa,bb,cc,d;d "In the spreadsheet presentation, the below line should look like the above line except the below shows a displayed comma instead of a semicolon between the d's." aa,bb,cc,"d,d", This works even in Excel

aa,bb,cc,"d,d", This works even in Excel 2000 aa,bb,cc,"d ,d", This works even in Excel 2000 aa,bb,cc,"d , d", This works even in Excel 2000

aa,bb,cc, " d,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d ,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d , d", This fails in Excel 2000 due to the space belore the 1st quote

aa,bb,cc,"d,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d ,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d , d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.

Rule: If you want to display a comma in a a cell (field) of a .csv file: "Start and end the field with a double quotes, but avoid white space before the 1st quote"

Rabelais answered 18/5, 2020 at 1:7 Comment(0)
C
0

As this is about general practices let's start from rules of the thumb:

  1. Don't use CSV, use XML with a library to read & write the xml file instead.

  2. If you must use CSV. Do it properly and use a free library to parse and store the CSV files.

To justify 1), most CSV parsers aren't encoding aware so if you aren't dealing with US-ASCII you are asking for troubles. For example excel 2002 is storing the CSV in local encoding without any note about the encoding. The CSV standard isn't widely adopted :(. On the other hand xml standard is well adopted and it handles encodings pretty well.

To justify 2), There is tons of csv parsers around for almost all language so there is no need to reinvent the wheel even if the solutions looks pretty simple.

To name few:

  • for python use build in csv module

  • for perl check CPAN and Text::CSV

  • for php use build in fgetcsv/fputcsv functions

  • for java check SuperCVS library

Really there is no need to implement this by hand if you aren't going to parse it on embedded device.

Cade answered 20/4, 2009 at 19:40 Comment(5)
XML isn't always the answer. CSV is the right format for the job when you have a lot of dense, tabular data (i.e. a spreadsheet). Those tags introduce a lot of overhead, and if each and every line has an identical format, there is no need to be explicit about what each and every value represents. XML is great when you have complicated hierarchical data, or records with optional fields. That isn't always the case.Peyter
In theory the "tags" introduce a bit of overhead but I can't think of any real life application where it start to be a problem. Do you have any practical examples? To work on data one should use an database instead of csv. if we speak about data serialization (backups, data interchange), will it matter if parsing takes a week instead of 5 days?Cade
Basically, any situation in which you have data that is best represented by a table. Say you have data from a dozen different sensors that you sample every so often, and you record the timestamp and the value of each of the sensors at that time. Each record is identical: timestamp, sensor0, sensor1, ... sensor11. XML is great for representing complex, irregular data, but it is a rather heavyweight format that does not fit every single situation. KISSPeyter
I totally agree that xml is not an answer for everything. Especially It is not well suited as a database replacement nor for configuration files. But here the question was about data interchange for which XML was designed for.Cade
It depends what the data is, more than what it is used for. I don't know what data is being transferred, so I'm not sure if CSV is the most appropriate, but if it really is regular, tabular data, a tabular format simply makes more sense than a hierarchical format. Quite often people shoehorn data that should be XML into another format, but I think the opposite is just as bad, and it's a pet peeve of mine.Peyter
S
0

First, let's ask ourselves, "Why do we feel the need to handle commas differently for CSV files?"

For me, the answer is, "Because when I export data into a CSV file, the commas in a field disappear and my field gets separated into multiple fields where the commas appear in the original data." (That it because the comma is the CSV field separator character.)

Depending on your situation, semi colons may also be used as CSV field separators.

Given my requirements, I can use a character, e.g., single low-9 quotation mark, that looks like a comma.

So, here's how you can do it in Go:

// Replace special CSV characters with single low-9 quotation mark
func Scrub(a interface{}) string {
    s := fmt.Sprint(a)
    s = strings.Replace(s, ",", "‚", -1)
    s = strings.Replace(s, ";", "‚", -1)
    return s
}

The second comma looking character in the Replace function is decimal 8218.

Be aware that if you have clients that may have ascii-only text readers that this decima 8218 character will not look like a comma. If this is your case, then I'd recommend surrounding the field with the comma (or semicolon) with double quotes per RFC 4128: https://www.rfc-editor.org/rfc/rfc4180

Slinkman answered 22/12, 2014 at 18:36 Comment(0)
S
0

Thank you others in this post.

I used the information here to create a function in JavaScript that will get csv output for an array of objects which may have property values containing commas.

like

rowsArray = [{obj1prop1: "foo", obj1prop2: "bar,baz"}, {obj2prop1: "qux", obj2prop2: "quux,corge,thud"}]

into

csvRowsArray = [{obj1prop1: "foo", obj1prop2: "\"bar,baz\""}, {...} ] 

To use the commas in the values in a csv, the value needs to be wrapped in double quotes. And in order to have double quotes in the value in the json object, they just need to be escaped, i.e., \", backslash double quote. The escape is made here by subbing in a template literal and including the necessary quotes `"${row[key]}"`. The quotes are escaped when put in the object.

Here is my function:

const calculateTheCSVExport = (props) => {
  if (props.rows === undefined) return;

  let jsonRowsArray = props.rows;
  // console.log(jsonRowsArray);

  let csvRowsArrayNoCommasInObjectValues = [];
  let csvCurrRowObject = {}

  jsonRowsArray.forEach(row => {
    Object.keys(row).forEach(key => {
      // console.log(key, row[key])
      if (row[key].indexOf(',') > -1) {
        csvCurrRowObject = {...csvCurrRowObject, [key]: `"${row[key]}"`} // enclose value in escaped double quotes in JSON in order to export commas to csv correctly. see more: https://mcmap.net/q/65565/-dealing-with-commas-in-a-csv-file
      } else {
        csvCurrRowObject = {...csvCurrRowObject, [key]: row[key]}
      }
    });

    csvRowsArrayNoCommasInObjectValues.push(csvCurrRowObject);
    csvCurrRowObject = {};
  })

  // console.log(csvRowsArrayNoCommasInObjectValues)
  return csvRowsArrayNoCommasInObjectValues;
}
Shakitashako answered 15/12, 2022 at 17:36 Comment(0)
B
-2

I think the easiest solution to this problem is to have the customer to open the csv in excel, and then ctrl + r to replace all comma with whatever identifier you want. This is very easy for the customer and require only one change in your code to read the delimiter of your choice.

Bandylegged answered 24/1, 2014 at 19:41 Comment(1)
Who says they have Excel? In fact who says its even a human being who is doing the uploading?...Gentilism
L
-4

Use a tab character (\t) to separate the fields.

Lynea answered 17/7, 2016 at 0:6 Comment(3)
-1 Great until someone uses a tab in their value then your right back to the problem the person asking the question has got. Swaping one delimiter char for another isnt going to solve the problem.Gentilism
Nonsense. People can't enter tabs in their data input. In most forms, that simply moves the data entry point to the next field.Lynea
"People can't enter tabs in their data input"....are you serious?? A) of course a person could put a tab in a input field B) who says it is a GUI the data comes from? C) who says its even a human that is entering the data?Gentilism

© 2022 - 2024 — McMap. All rights reserved.