Best practices for serializing objects to a custom string format for use in an output file
Asked Answered
P

8

36

I was just about to implement an override of ToString() on a particular business class in order to produce an Excel-friendly format to write to an output file, which will be picked up later and processed. Here's what the data is supposed to look like:

5555555 "LASTN SR, FIRSTN"  5555555555  13956 STREET RD     TOWNSVILLE  MI  48890   25.88   01-003-06-0934

It's no big deal for me to just make a format string and override ToString(), but that will change the behavior of ToString() for any objects I decide to serialize this way, making the implementation of ToString() all ragged across the library.

Now, I've been reading up on IFormatProvider, and a class implementing it sounds like a good idea, but I'm still a little confused about where all this logic should reside and how to build the formatter class.

What do you guys do when you need to make a CSV, tab-delimited or some other non-XML arbitrary string out of an object?

Patroclus answered 24/7, 2009 at 19:49 Comment(0)
P
76

Here is a generic fashion for creating CSV from a list of objects, using reflection:

public static string ToCsv<T>(string separator, IEnumerable<T> objectlist)
{
    Type t = typeof(T);
    FieldInfo[] fields = t.GetFields();

    string header = String.Join(separator, fields.Select(f => f.Name).ToArray());

    StringBuilder csvdata = new StringBuilder();
    csvdata.AppendLine(header);

    foreach (var o in objectlist) 
        csvdata.AppendLine(ToCsvFields(separator, fields, o));

    return csvdata.ToString();
}

public static string ToCsvFields(string separator, FieldInfo[] fields, object o)
{
    StringBuilder linie = new StringBuilder();

    foreach (var f in fields)
    {
        if (linie.Length > 0)
            linie.Append(separator);

        var x = f.GetValue(o);

        if (x != null)
            linie.Append(x.ToString());
    }

    return linie.ToString();
}

Many variations can be made, such as writing out directly to a file in ToCsv(), or replacing the StringBuilder with an IEnumerable and yield statements.

Playground answered 4/1, 2010 at 7:39 Comment(6)
While the objectList is huge, your solution will be slow and might have out-of-memory issue.Oberon
Instead of outputting to a string builder, you could let it output to some kind of an OutputStream instead, much like what XmlSerializer.Serialize(...) does. That would solve most out-of-memory issues.Nellnella
Or just change it to return an IEnumerable<string> and yield return the individual lines... then (unless you store the result in a list) you can enumerate it, e.g. to write out the result, with little memory overhead. Have added additional answer with these changes to your basic idea.Sisely
I am not sure that this check if (x != null) is appropriate. If you have an object with fields { a, b, c } and different instances have either a, b, or c being null, you will not produce a valid CSV from which you can recover the instances. A better way would be to place nulls as NULL or a blank entry.Sigh
WARNING: Don't use this code, as it outputs a broken CSV file! Fields in a CSV file are separated with a separator (usually comma) and thus the field content must be quoted if it contains the separator or a newline. This code doesn't do that. For details: tools.ietf.org/html/rfc4180Formic
this didnt work for me I am getting file max and min Values into converted csv fileNurserymaid
S
39

Here is a simplified version of Per Hejndorf's CSV idea (without the memory overhead as it yields each line in turn). Due to popular demand it also supports both fields and simple properties by use of Concat.

Update 18 May 2017

This example was never intended to be a complete solution, just advancing the original idea posted by Per Hejndorf. To generate valid CSV you need to replace any text delimiter characters, within the text, with a sequence of 2 delimiter characters. e.g. a simple .Replace("\"", "\"\"").

Update 12 Feb 2016

After using my own code again in a project today, I realised I should not have taken anything for granted when I started from the example of @Per Hejndorf. It makes more sense to assume a default delimiter of "," (comma) and make the delimiter the second, optional, parameter. My own library version also provides a 3rd header parameter that controls whether a header row should be returned as sometimes you only want the data.

e.g.

public static IEnumerable<string> ToCsv<T>(IEnumerable<T> objectlist, string separator = ",", bool header = true)
{
    FieldInfo[] fields = typeof(T).GetFields();
    PropertyInfo[] properties = typeof(T).GetProperties();
    if (header)
    {
        yield return String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p=>p.Name)).ToArray());
    }
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString())
            .Concat(properties.Select(p=>(p.GetValue(o,null) ?? "").ToString())).ToArray());
    }
}

so you then use it like this for comma delimited:

foreach (var line in ToCsv(objects))
{
    Console.WriteLine(line);
}

or like this for another delimiter (e.g. TAB):

foreach (var line in ToCsv(objects, "\t"))
{
    Console.WriteLine(line);
}

Practical examples

write list to a comma-delimited CSV file

using (TextWriter tw = File.CreateText("C:\testoutput.csv"))
{
    foreach (var line in ToCsv(objects))
    {
        tw.WriteLine(line);
    }
}

or write it tab-delimited

using (TextWriter tw = File.CreateText("C:\testoutput.txt"))
{
    foreach (var line in ToCsv(objects, "\t"))
    {
        tw.WriteLine(line);
    }
}

If you have complex fields/properties you will need to filter them out of the select clauses.


Previous versions and details below:

Here is a simplified version of Per Hejndorf's CSV idea (without the memory overhead as it yields each line in turn) and has only 4 lines of code :)

public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
{
    FieldInfo[] fields = typeof(T).GetFields();
    yield return String.Join(separator, fields.Select(f => f.Name).ToArray());
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString()).ToArray());
    }
}

You can iterate it like this:

foreach (var line in ToCsv(",", objects))
{
    Console.WriteLine(line);
}

where objects is a strongly typed list of objects.

This variation includes both public fields and simple public properties:

public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
{
    FieldInfo[] fields = typeof(T).GetFields();
    PropertyInfo[] properties = typeof(T).GetProperties();
    yield return String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p=>p.Name)).ToArray());
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString())
            .Concat(properties.Select(p=>(p.GetValue(o,null) ?? "").ToString())).ToArray());
    }
}
Sisely answered 7/6, 2012 at 9:57 Comment(18)
I have a problem with this when two properties have the same value such as "FullName" and "DisplayName", I's struggling to pin point the bug, can anyone else see it?Stickney
@Mark Jones: Please list your problem as a new question, with example etc. I cannot work out what you mean from your comment. Put a link to the question here if you want me to take a look. ThanksSisely
You also need to add appropriate escaping in for your separator of choice. Actual CSV, i.e. with a comma, should quote each field and escape backslashes, commas and double quotes with backslashes.Slosh
Oops, you don't need to escape the commas, as it's part of a quoted string.Slosh
@David: CSV character escaping consists of replacing the quoting delimiter (typically double-quote ") with a pair of the same delimiter. Backslash escaping is not part of CSV. I will update the answer to include this.Sisely
Thanks for this elegant solution. For anyone else using this, you might like to replace Union with Concat so that fields do not get combined when they are the sameLeu
Nice one, indeed005. That's the bug that MarK Jones reported and had me scratching my head.Protractor
This is an excellent answer. @TrueBlueAussie, you should update your code to use Concat instead of Union to make it super easy to cut/paste and work for all scenarios. You can see from the comments what the issue is. If you have two properties with same type (like bool) that both equal same value, you will miss one of those values and the columns won't align. I just used this code in my project, hit the bug, used the fix, and it works now. I attempted to make this edit myself, but was rejected by the reviewers.Gurevich
@KoryGill: Updated Answer: Is that what you intended?Sisely
@TrueBlueAussie, the code looks good now, and should cut/paste nicely for everyone.Gurevich
@ajbeaven: Thanks for the edit. I hand-edited that update, when I should have cut & paste :)Sisely
WARNING: Don't use this code, as it outputs a broken CSV file! Fields in a CSV file are separated with a separator (usually comma) and thus the field content must be quoted if it contains the separator or a newline. This code doesn't do that. For details: tools.ietf.org/html/rfc4180Formic
@Askaga: Best not go around crying "the sky is falling" when the fix for that issue is adding a trivial Replace call :) This example was designed to remain readable, not 100% complete..Sisely
this code works with nested Properties of Complex types? (example: Person.Address.AddressString)Bechtel
@user3373603: CSV is a flat structure, not hierarchical. Project your data into a flat model and serialise that instead. Otherwise you will want to build something far more complex to auto-map CSV input (which is well beyond the scope of this question/answer).Sisely
little confusing but I implemented this,y yes this solved Out of memory exception but My csv file storing only File Max and Min values not inputs JSON files content.Nurserymaid
@Rinku: Sounds like several questions in your comment. Perhaps you could post a question to get some clarity on use of this little helper?Sisely
This code works perfectly for me. However, my class has a DisplayName attribute above the class property. How do I change the code to get the Headers read the DisplayName attribute instead of the property name. [DisplayName("Library Name")] public string LibraryName{ get;set;}Congdon
W
8

As rule of thumb I advocate only overriding toString as a tool for debugging, if it's for business logic it should be an explicit method on the class/interface.

For simple serialization like this I'd suggest having a separate class that knows about your CSV output library and your business objects that does the serialization rather than pushing the serialization into the business objects themselves.

This way you end up with a class per output format that produces a view of your model.

For more complex serialization where you're trying to write out an object graph for persistence I'd consider putting it in the business classes - but only if it makes for cleaner code.

Womanhater answered 24/7, 2009 at 20:5 Comment(1)
I agree, I only override ToString() when I'm debugging. I would also create a separate serialization class but you can also add a property called "CSVString" that could build the string on the fly...Judyjudye
I
2

The problem with the solutions I found so far is that they don't let you export a subset of properties, but only the entire object. Most of the time, when we need to export data in CSV, we need to "tailor" its format in a precise way, so I created this simple extension method that allows me to do that by passing an array of parameters of type Func<T, string> to specify the mapping.

public static string ToCsv<T>(this IEnumerable<T> list, params Func<T, string>[] properties)
{
    var columns = properties.Select(func => list.Select(func).ToList()).ToList();

    var stringBuilder = new StringBuilder();

    var rowsCount = columns.First().Count;

    for (var i = 0; i < rowsCount; i++)
    {
        var rowCells = columns.Select(column => column[i]);

        stringBuilder.AppendLine(string.Join(",", rowCells));
    }

    return stringBuilder.ToString();
}

Usage:

philosophers.ToCsv(x => x.LastName, x => x.FirstName)

Generates:

Hayek,Friedrich
Rothbard,Murray
Brent,David
Indistinct answered 10/1, 2018 at 17:1 Comment(0)
S
1

I had an issue the HiTech Magic's variation were two properties with the same value, only one would get populated. This seems to have fixed it:

        public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
    {
        FieldInfo[] fields = typeof(T).GetFields();
        PropertyInfo[] properties = typeof(T).GetProperties();
        yield return String.Join(separator, fields.Select(f => f.Name).Union(properties.Select(p => p.Name)).ToArray());
        foreach (var o in objectlist)
        {
            yield return string.Join(separator, (properties.Select(p => (p.GetValue(o, null) ?? "").ToString())).ToArray());
        }
    }
Stickney answered 21/6, 2013 at 14:4 Comment(2)
This will print out the headers for fields and properties and the values only for the properties, i.e. it will only work if there are no fields.Slosh
This answer has removed functionality from the original and will not work if there are also fields present. The solution is to use Concat instead of Union (have updated my answer with this).Sisely
C
1

Gone Coding's answer was very helpful. I made some changes to it in order to handle text gremlins that would hose the output.

 /******************************************************/
    public static IEnumerable<string> ToCsv<T>(IEnumerable<T> objectlist, string separator = ",", bool header = true)
    {
       FieldInfo[] fields = typeof(T).GetFields();
       PropertyInfo[] properties = typeof(T).GetProperties();
       string str1;
       string str2;

       if(header)
       {
          str1 = String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p => p.Name)).ToArray());
          str1 = str1 + Environment.NewLine;
          yield return str1;
       }
       foreach(var o in objectlist)
       {
          //regex is to remove any misplaced returns or tabs that would
          //really mess up a csv conversion.
          str2 = string.Join(separator, fields.Select(f => (Regex.Replace(Convert.ToString(f.GetValue(o)), @"\t|\n|\r", "") ?? "").Trim())
             .Concat(properties.Select(p => (Regex.Replace(Convert.ToString(p.GetValue(o, null)), @"\t|\n|\r", "") ?? "").Trim())).ToArray());

          str2 = str2 + Environment.NewLine;
          yield return str2;
       }
    }
Collinear answered 31/8, 2016 at 19:57 Comment(0)
M
1

ServiceStack.Text is a popular NuGet package that supports CSV serialization. Then this will be all the code you need:

CsvSerializer.SerializeToCsv(foo)

If you don't want the headers, use this code first:

CsvConfig<Foo>.OmitHeaders = true;
Martita answered 27/10, 2019 at 0:30 Comment(0)
R
0

Gone Coding's answer is great! I made a couple changes to Gone Coding's answer to qualify fields with double quotes and for the properties iterator to ignore any properties without index parameters i.e. properties in a class that enable property name getters and setters:

FieldInfo[] fields = typeof(T).GetFields();
PropertyInfo[] properties = typeof(T).GetProperties().Where(x => x.GetIndexParameters().Length == 0).ToArray();
yield return string.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p => p.Name)).ToArray());
foreach (var o in objectlist)
{
    yield return string.Join(separator, fields.Select(f => "\"" + (( f.GetValue(o) ?? "").ToString()) + "\"")
                    .Concat(properties.Select(p => ("\"" + (p.GetValue(o, null) ?? "").ToString()) + "\"")).ToArray());
}
Rosamondrosamund answered 27/8, 2020 at 18:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.