escaping tricky string to CSV format
Asked Answered
D

6

44

I have to create a CSV file from webservice output and the CSV file uses quoted strings with comma separator. I cannot change the format...

So if I have a string it becomes a "string"... If the value has quotes already they are replaced with double quotes. For example a str"ing becomes "str""ing"...

However, lately my import has been failing because of the following

  • original input string is: "","word1,word2,..."
  • every single quote is replaced by double resulting in: """",""word1,word2,...""
  • then its prefixed and suffixed with quote before written to CVS file: """"",""word1,word2,..."""

As you can see the final result is this:

""""",""word1,word2,..."""

which breaks my import (is sees it as another field)... I think the issue is appereance of "," in the original input string.

Is there a CVS escape sequence for this scenario?

Update

The reason why above breaks is due to BCP mapping file (BCP utility is used to load CSV file into SQL db) which has terminator defined as "," . So instead of seeing 1 field it sees 2...But I cannot change the mapping file...

Darceydarci answered 16/6, 2011 at 19:29 Comment(6)
Different applications tend to have different CSV specifications. What application are you importing in? It might be worth checking what it outputs for your specific usecase.Thrashing
If "","word1,word2,..." is the value then """"",""word1,word2,...""" is the escaped CSV field. What exactly is breaking?Janenejanenna
@Janenejanenna the process that imports the file uses BCP utility to import it into SQL datbase...the terminator is declared as '","' which interprets above as 2 fields instead of 1Darceydarci
So that utility is broken. That's awesome! ...not. Workaround: Replace all " with something else, import, perform SQL query to replace something else back to ".Janenejanenna
I am not sure BCP works with CSV input. If it purports to, it isn't reading it correctly.Criticism
@Janenejanenna I think that may be my only option...Darceydarci
C
129

I use this code and it has always worked:

/// <summary>
/// Turn a string into a CSV cell output
/// </summary>
/// <param name="str">String to output</param>
/// <returns>The CSV cell formatted string</returns>
public static string StringToCSVCell(string str)
{
    bool mustQuote = (str.Contains(",") || str.Contains("\"") || str.Contains("\r") || str.Contains("\n"));
    if (mustQuote)
    {
        StringBuilder sb = new StringBuilder();
        sb.Append("\"");
        foreach (char nextChar in str)
        {
            sb.Append(nextChar);
            if (nextChar == '"')
                sb.Append("\"");
        }
        sb.Append("\"");
        return sb.ToString();
    }

    return str;
}
Criticism answered 16/6, 2011 at 19:45 Comment(7)
I've read CSV from the above code into Excel without issues. There may be a particular CSV reader that is broken if my code gets the same output as yours.Criticism
I get the same output: """"",""word1,word2,...""" from my code and it reads into Excel as "","word1,word2,..." as expected. If your code produces that output, your code is correct already.Criticism
I believe my problem is the import utility terminator (as explained in question update), so I was wondering if there is a way to add additional escape sequence to make it workDarceydarci
sweet! Was looking for the exact same thing.Serving
This throws an exception when str == nullBaroque
@Baroque yes, that's obvious, and done purposely. Null checks are often inappropriate in functions that are called many times. If you adopt this code then how you deal with bad data is up to you.Criticism
C# extension method version: public static string ToCSVCell(this string str)Dissimilation
A
18

Based on Ed Bayiates' answer:

/// <summary>
/// Turn a string into a CSV cell output
/// </summary>
/// <param name="value">String to output</param>
/// <returns>The CSV cell formatted string</returns>
private string ConvertToCsvCell(string value)
{
    var mustQuote = value.Any(x => x == ',' || x == '\"' || x == '\r' || x == '\n');

    if (!mustQuote)
    {
        return value;
    }

    value = value.Replace("\"", "\"\"");

    return string.Format("\"{0}\"", value);
}
Ayurveda answered 9/11, 2016 at 17:46 Comment(6)
Easiest solution with simple replace.Simmonds
But computationally much, much more expensive! There is a good reason Ed's code is using StringBuilder. Performance will be orders of magnitude worse for large CSV files! Easy is relative. If you like spending hours tracing down performance bottlenecks in your code, copy this snippet. If not, use Ed's code.Affable
Use Regex.IsMatch(value, "[,\"\\r\\n]"); for the mustQuote expression. Faster.Dray
Why would this be computationally much more expensive? Both functions work at the Cell level, rather than on the whole file, and iterating across every character in a string and adding to a StringBuilder is surely not more expensive than a single .Replace, as Replace uses a Span internally, as does StringBuilder, so in both cases, we take a string, convert it to a Span of Char then in one we continually add to that Span, in the other the final Span size is pre-calculated. If this were processing an whole file, agreed, a StringBuilder is far better than a looped Replace.Miranda
Proposed one liner: private string ConvertToCsvCell(string value) => Regex.IsMatch(value, "[,\"\\r\\n]") ? $"\"{value.Replace("\"", "\"\"")}\"" : value;Miranda
Alternatively, if you don't like regex but do like pattern matching: private string ConvertToCsvCell(string value) => value.Any(x=> x is ',' or '"' or '\r' or '\n') ? $"\"{value.Replace("\"", "\"\"")}\"" : value;Miranda
T
4

My penny thought:

String[] lines = new String[] { "\"\",\"word\",word,word2,1,34,5,2,\"details\"" };
for (int j = 0; j < lines.Length; j++)
{
    String[] fields=lines[j].Split(',');
    for (int i =0; i<fields.Length; i++)
    {
        if (fields[i].StartsWith("\"") && fields[i].EndsWith("\""))
        {
            char[] tmp = new char[fields[i].Length-2];
            fields[i].CopyTo(1,tmp,0,fields[i].Length-2);
            fields[i] =tmp.ToString();
            fields[i] = "\""+fields[i].Replace("\"","\"\"")+"\"";
        }
        else
            fields[i] = fields[i].Replace("\"","\"\"");

    }
    lines[j]=String.Join(",",fields);

}

Toluca answered 16/6, 2011 at 19:58 Comment(0)
V
3

Based on contribution of "Ed Bayiates" here's an helpful class to buid csv document:

/// <summary>
/// helpful class to build csv document
/// </summary>
public class CsvBuilder
{
    /// <summary>
    /// create the csv builder
    /// </summary>
    public CsvBuilder(char csvSeparator)
    {
        m_csvSeparator = csvSeparator;
    }

    /// <summary>
    /// append a cell
    /// </summary>
    public void appendCell(string strCellValue)
    {
        if (m_nCurrentColumnIndex > 0) m_strBuilder.Append(m_csvSeparator);

        bool mustQuote = (strCellValue.Contains(m_csvSeparator)
                        || strCellValue.Contains('\"') 
                        || strCellValue.Contains('\r') 
                        || strCellValue.Contains('\n'));

        if (mustQuote)
        {
            m_strBuilder.Append('\"');
            foreach (char nextChar in strCellValue)
            {
                m_strBuilder.Append(nextChar);
                if (nextChar == '"') m_strBuilder.Append('\"');
            }
            m_strBuilder.Append('\"');
        }
        else
        {
            m_strBuilder.Append(strCellValue);
        }
        m_nCurrentColumnIndex++;
    }

    /// <summary>
    /// end of line, new line
    /// </summary>
    public void appendNewLine()
    {
        m_strBuilder.Append(Environment.NewLine);
        m_nCurrentColumnIndex = 0;
    }

    /// <summary>
    /// Create the CSV file
    /// </summary>
    /// <param name="path"></param>
    public void save(string path )
    {
        File.WriteAllText(path, ToString());
    }

    public override string ToString()
    {
        return m_strBuilder.ToString();
    }

    private StringBuilder m_strBuilder = new StringBuilder();
    private char m_csvSeparator;
    private int m_nCurrentColumnIndex = 0;

}

How to use it:

void exportAsCsv( string strFileName )
{
    CsvBuilder csvStringBuilder = new CsvBuilder(';');
    csvStringBuilder.appendCell("#Header col 1 : Name");
    csvStringBuilder.appendCell("col 2 : Value");
    csvStringBuilder.appendNewLine();
    foreach (Data data in m_dataSet)
    {
        csvStringBuilder.appendCell(data.getName());
        csvStringBuilder.appendCell(data.getValue());
        csvStringBuilder.appendNewLine();
    }
    csvStringBuilder.save(strFileName);
}
Veal answered 29/7, 2015 at 14:12 Comment(0)
H
1

the first step in parsing this is removing the extra added " 's around your string. Once you do this, you should be able to deal with the embedded " as well as the ,'s.

Hakon answered 16/6, 2011 at 19:45 Comment(1)
The original string has to appear in db as is - I wish I could just remove empty quotes from it...Darceydarci
D
0

After much deliberation, it was decided that import utility format was needed to be fixed. The escaping of the string was correct (as users indicated) but the format file that import utility used was incorrect and was causing it to break import.

Thanks all and special thanks to @dbt (up vote)

Darceydarci answered 23/6, 2011 at 17:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.