Efficient way to write a lot of lines to a text file
Asked Answered
J

3

11

I started off doing something as follows:

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    foreach (MyClassA myClassA in myClassAs)
    {
        textWriter.WriteLine(myIO.GetCharArray(myClassA));

        if (myClassA.MyClassBs != null)
            myClassA.MyClassBs.ToList()
                .ForEach(myClassB =>
                    textWriter.WriteLine(myIO.GetCharArray((myClassB)));

        if (myClassA.MyClassCs != null)
            myClassA.MyClassCs.ToList()
                .ForEach(myClassC =>
                    textWriter.WriteLine(myIO.GetCharArray(myClassC)));
    }
}

This seemed pretty slow (~35 seconds for 35,000 lines).

Then I tried to follow the example here to create a buffer, with the following code, but it didn't gain me anything. I was still seeing times around 35 seconds. Is there an error in how I implemented the buffer?

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    char[] newLineChars = Environment.NewLine.ToCharArray();
    //Chunk through 10 lines at a time.
    int bufferSize = 500 * (RECORD_SIZE + newLineChars.Count());
    char[] buffer = new char[bufferSize];
    int recordLineSize = RECORD_SIZE + newLineChars.Count();
    int bufferIndex = 0;

    foreach (MyClassA myClassA in myClassAs)
    {
        IEnumerable<IMyClass> myClasses =
            new List<IMyClass> { myClassA }
                .Union(myClassA.MyClassBs)
                .Union(myClassA.MyClassCs);

        foreach (IMyClass myClass in myClasses)
        {
            Array.Copy(myIO.GetCharArray(myClass).Concat(newLineChars).ToArray(),
                0, buffer, bufferIndex, recordLineSize);

            bufferIndex += recordLineSize;

            if (bufferIndex >= bufferSize)
            {
                textWriter.Write(buffer);

                bufferIndex = 0;
            }
        }
    }

    if (bufferIndex > 0)
        textWriter.Write(buffer);
}

Is there a better way to accomplish this?

Javed answered 26/6, 2013 at 16:16 Comment(13)
Have you tried using StringBuilder and build the string then just write once to the file.Lehr
any specific reason to use buffers?Rhizomorphous
How certain are you that it's really the IO that's the problem here? What happens if you just write to a StreamWriter wrapped around a MemoryStream, for example?Smollett
Left justify the code for readability. What is MyClassBs?Farica
@Blam - MyClassB (as well as MyClassA and MyClassC) are typical POCOs. MyClassA contains a list of MyClassBs and a list of MyClassCs. Does that answer your question?Javed
@Rhizomorphous - it is based on the assumption that there is an expense attached to writing a line that is mitigated by writing more than one line at a time. Not sure if it's an accurate assumption, but I got the idea from the article I linked.Javed
@Lehr - I haven't tried that. I can give it a shot, though.Javed
How do you know 35 seconds is "pretty slow"? Test the IO first to get some bounds? before tinkering with working c# code...Placid
@Jon Skeet - I tried using a MemoryStream like you suggested, and it took about the same amount of time, so I think the issue must be with the GetCharArray() method. I'm going to look into it some more. Thanks.Javed
Did not ask what MyClassA is. Asked what MyClassBs is. List? OverservableCollection? Enumerable? Might be able to eliminate the ToList() call.Farica
@Blam - MyClassBs is an IEnumerable<MyClassB>.Javed
@Lehr The downside of building all the lines in memory is that you have to have all the lines in memory. E.g. i wrote a data export that writes out a few billion lines. It would be a waste to cache that massive string in memory.Spacetime
@IanBoyd I agree if you are producing a billion lines it would be inefficient but if you are doing a few then it would be fine. Without know the details its hard to give the correct solution.Lehr
C
1

I threw together a simple snippet that I think is a bit cleaner; but, then again, I'm not quite sure what you are trying to accomplish. Also, I don't have any of your classes available, so I can't really do any kind of tests.

This sample does basically the same thing you have; except that it uses some generic methods, and it does all the writing in one spot.

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
    List<char[]> outputLines = new List<char[]>();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Add(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        outputLines.AddRange(myClassA.MyClassBs.Select(myClassB => myIO.GetCharArray(myClassB)));

    if (myClassA.MyClassCs != null)
        outputLines.AddRange(myClassA.MyClassCs.Select(myClassC => myIO.GetCharArray(myClassC)));
}

var lines = outputLines.Select(line => string.Concat<char>(line));
if (append)
    File.AppendAllLines(filePath, lines);
else
    File.WriteAllLines(filePath, lines);

Here's the StringBuilder version:

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
StringBuilder outputLines = new StringBuilder();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Append(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        myClassA.MyClassBs.ForEach(myClassB=>outputLines.Append(myClassB));

    if (myClassA.MyClassCs != null)
        myClassA.MyClassCs.ForEach(myClassC => outputLines.Append(myClassC));
}

if (append)
    File.AppendAllText(filePath, outputLines.ToString());
else
    File.WriteAllText(filePath, outputLines.ToString());
Collector answered 26/6, 2013 at 18:52 Comment(2)
I tried that and saw an improvement of 2 - 3 seconds, so it's better than what I had. Thanks for sharing.Javed
As some others have said, you might try using a StringBuilder and appending the test, as opposed to using the list, then writing the StringBuilder's ToString() as a single write. I don't know how much/if that would improve it at all.Collector
U
13

I strongly suspect that the majority of your time is not spent in the I/O. There's no way that it should take 35 seconds to write 35,000 lines, unless those lines are really long.

Most likely, the majority of time is spent in the GetCharArray method, whatever that does.

A few suggestions:

If you really think I/O is the problem, increase the stream's buffer size. Call the StreamWriter constructor that lets you specify a buffer size. For example,

using (TextWriter textWriter = new StreamWriter(filePath, append, Encoding.Utf8, 65536))

That'll perform better than the default 4K buffer size. Going higher than 64K for the buffer size is not generally useful, and can actually decrease performance.

Don't pre-buffer lines or append to a StringBuilder. That might give you small performance increases, but at a huge cost in complexity. The small performance boost isn't worth the maintenance nightmare.

Take advantage of foreach. You have this code:

if (myClassA.MyClassBs != null)
    myClassA.MyClassBs.ToList()
        .ForEach(myClassB =>
            textWriter.WriteLine(myIO.GetCharArray((myClassB)));

That has to create a concrete list from whatever MyClassBs collection is, and then enumerate it. Why not just enumerate the thing directly:

if (myClassA.MyClassBs != null)
{
    foreach (var myClassB in myClassA.MyClassBs)
    {
        textWriter.WriteLine(myIO.GetCharArray((myClassB)));
    }
}

That will save you the memory required by the ToList, and the time it takes to enumerate the collection when creating the list.

All that said, it's almost certain that your GetCharArray method is the thing that's taking all the time. If you really want to speed up your program, look there. Trying to optimize writing to the StreamWriter is a waste of time. You're not going to get significant performance increases there.

Urogenous answered 26/6, 2013 at 22:2 Comment(1)
You should probably avoid using .ForEach() anyway.Sparkie
C
1

I threw together a simple snippet that I think is a bit cleaner; but, then again, I'm not quite sure what you are trying to accomplish. Also, I don't have any of your classes available, so I can't really do any kind of tests.

This sample does basically the same thing you have; except that it uses some generic methods, and it does all the writing in one spot.

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
    List<char[]> outputLines = new List<char[]>();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Add(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        outputLines.AddRange(myClassA.MyClassBs.Select(myClassB => myIO.GetCharArray(myClassB)));

    if (myClassA.MyClassCs != null)
        outputLines.AddRange(myClassA.MyClassCs.Select(myClassC => myIO.GetCharArray(myClassC)));
}

var lines = outputLines.Select(line => string.Concat<char>(line));
if (append)
    File.AppendAllLines(filePath, lines);
else
    File.WriteAllLines(filePath, lines);

Here's the StringBuilder version:

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
StringBuilder outputLines = new StringBuilder();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Append(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        myClassA.MyClassBs.ForEach(myClassB=>outputLines.Append(myClassB));

    if (myClassA.MyClassCs != null)
        myClassA.MyClassCs.ForEach(myClassC => outputLines.Append(myClassC));
}

if (append)
    File.AppendAllText(filePath, outputLines.ToString());
else
    File.WriteAllText(filePath, outputLines.ToString());
Collector answered 26/6, 2013 at 18:52 Comment(2)
I tried that and saw an improvement of 2 - 3 seconds, so it's better than what I had. Thanks for sharing.Javed
As some others have said, you might try using a StringBuilder and appending the test, as opposed to using the list, then writing the StringBuilder's ToString() as a single write. I don't know how much/if that would improve it at all.Collector
S
0

Use buffered stream for writing

e.g. for buffered writing to Console use

TextWriter w = new StreamWriter(new BufferedStream(Console.OpenStandardOutput()));
    w.WriteLine("Your text here");

similarly for buffered writing to file use

TextWriter w = new StreamWriter(new BufferedStream(new FileStream("myFilePath.txt", FileMode.Create)));
w.WriteLine("Your text here");
Serenity answered 3/6, 2018 at 11:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.