How to export large SQL Server table into a CSV file using the FileHelpers library?
Asked Answered
O

5

9

I'm looking to export a large SQL Server table into a CSV file using C# and the FileHelpers library. I could consider C# and bcp as well, but I thought FileHelpers would be more flexible than bcp. Speed is not a special requirement. OutOfMemoryException is thrown on the storage.ExtractRecords() when the below code is run (some less essential code has been omitted):

  SqlServerStorage storage = new SqlServerStorage(typeof(Order));
    storage.ServerName = "SqlServer"; 
    storage.DatabaseName = "SqlDataBase";
    storage.SelectSql = "select * from Orders";
    storage.FillRecordCallback = new FillRecordHandler(FillRecordOrder);
    Order[] output = null;
    output = storage.ExtractRecords() as Order[];

When the below code is run, 'Timeout expired' is thrown on the link.ExtractToFile():

 SqlServerStorage storage = new SqlServerStorage(typeof(Order));
    string sqlConnectionString = "Server=SqlServer;Database=SqlDataBase;Trusted_Connection=True";
    storage.ConnectionString = sqlConnectionString;
    storage.SelectSql = "select * from Orders";
    storage.FillRecordCallback = new FillRecordHandler(FillRecordOrder);
    FileDataLink link = new FileDataLink(storage);
    link.FileHelperEngine.HeaderText = headerLine;
    link.ExtractToFile("file.csv");

The SQL query run takes more than the default 30 sec and therefore the timeout exception. Unfortunately, I can't find in the FileHelpers docs how to set the SQL Command timeout to a higher value.

I could consider to loop an SQL select on small data sets until the whole table gets exported, but the procedure would be too complicated. Is there a straightforward method to use FileHelpers on large DB tables export?

Olla answered 18/10, 2013 at 17:44 Comment(1)
did you tried without FileHelpers ?Demos
M
0

FileHelpers has an async engine which is better suited for handling large files. Unfortunately, the FileDataLink class does not use it, so there's no easy way to use it with SqlStorage.

It's not very easy to modify the SQL timeout either. The easiest way would be to copy the code for SqlServerStorage to create your own alternative storage provider and provide replacements for ExecuteAndClose() and ExecuteAndLeaveOpen() which set the timeout on the IDbCommand. (SqlServerStorage is a sealed class, so you cannot just subclass it).

You might want to check out ReactiveETL which uses the FileHelpers async engine for handling files along with a rewrite of Ayende's RhinoETL using ReactiveExtensions to handle large datasets.

Mardellmarden answered 22/10, 2013 at 13:37 Comment(1)
This is what I was afraid of. The FileHelpers team should consider to add the Timeout property to the SqlServerStorage class. It would be great for FileHelpers to cover more use cases, IMO.Olla
A
18

Rei Sivan's answer is on the right track, as it will scale well with large files, because it avoids reading the entire table into memory. However, the code can be cleaned up.

shamp00's solution requires external libraries.

Here is a simpler table-to-CSV-file exporter that will scale well to large files, and does not require any external libraries:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Linq;

public class TableDumper
{
    public void DumpTableToFile(SqlConnection connection, string tableName, string destinationFile)
    {
        using (var command = new SqlCommand("select * from " + tableName, connection))
        using (var reader = command.ExecuteReader())
        using (var outFile = File.CreateText(destinationFile))
        {
            string[] columnNames = GetColumnNames(reader).ToArray();
            int numFields = columnNames.Length;
            outFile.WriteLine(string.Join(",", columnNames));
            if (reader.HasRows)
            {
                while (reader.Read())
                {
                    string[] columnValues = 
                        Enumerable.Range(0, numFields)
                                  .Select(i => reader.GetValue(i).ToString())
                                  .Select(field => string.Concat("\"", field.Replace("\"", "\"\""), "\""))
                                  .ToArray();
                    outFile.WriteLine(string.Join(",", columnValues));
                }
            }
        }
    }
    private IEnumerable<string> GetColumnNames(IDataReader reader)
    {
        foreach (DataRow row in reader.GetSchemaTable().Rows)
        {
            yield return (string)row["ColumnName"];
        }
    }
}

I wrote this code, and declare it CC0 (public domain).

Aubade answered 2/1, 2014 at 18:1 Comment(1)
This code doesn't work... it doesn't handle commas in the datasetAfterthought
I
4

I incorporate 2 The code above. I use this code. I use VS 2010.

      //this is all lib that i used|||||||||||||||

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using UsbLibrary;
using System.Data;
using System.Data.SqlClient;
using System.Configuration;
using System.Globalization;




        //cocy in a button||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
        SqlConnection _connection = new SqlConnection();
        SqlDataAdapter _dataAdapter = new SqlDataAdapter();
        SqlCommand _command = new SqlCommand();
        DataTable _dataTable = new DataTable();

        _connection = new SqlConnection();
        _dataAdapter = new SqlDataAdapter();
        _command = new SqlCommand();
        _dataTable = new DataTable();

        //dbk is my database name that you can change it to your database name
        _connection.ConnectionString = "Data Source=.;Initial Catalog=dbk;Integrated Security=True";
        _connection.Open();

        SaveFileDialog saveFileDialogCSV = new SaveFileDialog();
        saveFileDialogCSV.InitialDirectory = Application.ExecutablePath.ToString();

        saveFileDialogCSV.Filter = "CSV files (*.csv)|*.csv|All files (*.*)|*.*";
        saveFileDialogCSV.FilterIndex = 1;
        saveFileDialogCSV.RestoreDirectory = true;

        string   path_csv="";
        if (saveFileDialogCSV.ShowDialog() == DialogResult.OK)
        {
            // Runs the export operation if the given filenam is valid.
            path_csv=   saveFileDialogCSV.FileName.ToString();
        }


             DumpTableToFile(_connection, "tbl_trmc", path_csv);

        }
        //end of code in button|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


    public void DumpTableToFile(SqlConnection connection, string tableName, string destinationFile)
    {
        using (var command = new SqlCommand("select * from " + tableName, connection))
        using (var reader = command.ExecuteReader())
        using (var outFile = System.IO.File.CreateText(destinationFile))
        {
            string[] columnNames = GetColumnNames(reader).ToArray();
            int numFields = columnNames.Length;
            outFile.WriteLine(string.Join(",", columnNames));
            if (reader.HasRows)
            {
                while (reader.Read())
                {
                    string[] columnValues =
                        Enumerable.Range(0, numFields)
                                  .Select(i => reader.GetValue(i).ToString())
                                  .Select(field => string.Concat("\"", field.Replace("\"", "\"\""), "\""))
                                  .ToArray();
                    outFile.WriteLine(string.Join(",", columnValues));
                }
            }
        }
    }
    private IEnumerable<string> GetColumnNames(IDataReader reader)
    {
        foreach (DataRow row in reader.GetSchemaTable().Rows)
        {
            yield return (string)row["ColumnName"];
        }
    }
Irrecusable answered 23/4, 2014 at 7:1 Comment(0)
S
1

try this one:

private void exportToCSV()
{
    //Asks the filenam with a SaveFileDialog control.

    SaveFileDialog saveFileDialogCSV = new SaveFileDialog();
    saveFileDialogCSV.InitialDirectory = Application.ExecutablePath.ToString();

    saveFileDialogCSV.Filter = "CSV files (*.csv)|*.csv|All files (*.*)|*.*";
    saveFileDialogCSV.FilterIndex = 1;
    saveFileDialogCSV.RestoreDirectory = true;

    if (saveFileDialogCSV.ShowDialog() == DialogResult.OK)
    {
        // Runs the export operation if the given filenam is valid.
        exportToCSVfile(saveFileDialogCSV.FileName.ToString());
    }
}


 * Exports data to the CSV file.
 */
private void exportToCSVfile(string fileOut)
{
    // Connects to the database, and makes the select command.
    string sqlQuery = "select * from dbo." + this.lbxTables.SelectedItem.ToString();
    SqlCommand command = new SqlCommand(sqlQuery, objConnDB_Auto);

    // Creates a SqlDataReader instance to read data from the table.
    SqlDataReader dr = command.ExecuteReader();

    // Retrives the schema of the table.
    DataTable dtSchema = dr.GetSchemaTable();

    // Creates the CSV file as a stream, using the given encoding.
    StreamWriter sw = new StreamWriter(fileOut, false, this.encodingCSV);

    string strRow; // represents a full row

    // Writes the column headers if the user previously asked that.
    if (this.chkFirstRowColumnNames.Checked)
    {
        sw.WriteLine(columnNames(dtSchema, this.separator));
    }

    // Reads the rows one by one from the SqlDataReader
    // transfers them to a string with the given separator character and
    // writes it to the file.
    while (dr.Read())
    {
        strRow = "";
        for (int i = 0; i < dr.FieldCount; i++)
        {
            switch (Convert.ToString(dr.GetFieldType(i)))
            {
                case "System.Int16":
                    strRow += Convert.ToString(dr.GetInt16(i));
                    break;

                case "System.Int32" :
                    strRow += Convert.ToString(dr.GetInt32(i));
                    break;

                case "System.Int64":
                    strRow += Convert.ToString(dr.GetInt64(i));
                    break;

                case "System.Decimal":
                    strRow += Convert.ToString(dr.GetDecimal(i));
                    break;

                case "System.Double":
                    strRow += Convert.ToString(dr.GetDouble(i));
                    break;

                case "System.Float":
                    strRow += Convert.ToString(dr.GetFloat(i));
                    break;

                case "System.Guid":
                    strRow += Convert.ToString(dr.GetGuid(i));
                    break;

                case "System.String":
                    strRow += dr.GetString(i);
                    break;

                case "System.Boolean":
                    strRow += Convert.ToString(dr.GetBoolean(i));
                    break;

                case "System.DateTime":
                    strRow += Convert.ToString(dr.GetDateTime(i));
                    break;
            }

            if (i < dr.FieldCount - 1)
            {
                strRow += this.separator;
            }
        }
        sw.WriteLine(strRow);
    }


    // Closes the text stream and the database connenction.
    sw.Close();
    dr.Close();

    // Notifies the user.
    MessageBox.Show("ready");
}
Sassan answered 27/10, 2013 at 8:7 Comment(1)
I have eventually used Entity Framework, your sample code can work as well, but we're a little bit out of the topic.Olla
C
1

Very appreciative of Jay Sullivan's answer -- was very helpful for me.

Building on that, I observed that in his solution the string formatting of varbinary and string data types was not good -- varbinary fields would come out as literally "System.Byte" or something like that, while datetime fields would be formatted MM/dd/yyyy hh:mm:ss tt, which is not desirable for me.

Below I is my hacked-together solution which converts to string differently based on data type. It is uses nested ternary operators, but it works!

Hope it is helpful for someone.

public static void DumpTableToFile(SqlConnection connection, Dictionary<string, string> cArgs)
{
    string query = "SELECT ";
    string z = "";
    if (cArgs.TryGetValue("top_count", out z))
    {
        query += string.Format("TOP {0} ", z);
    }
    query += string.Format("* FROM {0} (NOLOCK) ", cArgs["table"]);
    string lower_bound = "", upper_bound = "", column_name = "";
    if (cArgs.TryGetValue("lower_bound", out lower_bound) && cArgs.TryGetValue("column_name", out column_name))
    {
        query += string.Format("WHERE {0} >= {1} ", column_name, lower_bound);
        if (cArgs.TryGetValue("upper_bound", out upper_bound))
        {
            query += string.Format("AND {0} < {1} ", column_name, upper_bound);
        }
    }
    Console.WriteLine(query);
    Console.WriteLine("");
    using (var command = new SqlCommand(query, connection))
    using (var reader = command.ExecuteReader())
    using (var outFile = File.CreateText(cArgs["out_file"]))
    {
        string[] columnNames = GetColumnNames(reader).ToArray();
        int numFields = columnNames.Length;
        Console.WriteLine(string.Join(",", columnNames));
        Console.WriteLine("");
        if (reader.HasRows)
        {
            Type datetime_type = Type.GetType("System.DateTime");
            Type byte_arr_type = Type.GetType("System.Byte[]");
            string format = "yyyy-MM-dd HH:mm:ss.fff";
            int ii = 0;
            while (reader.Read())
            {
                ii += 1;
                string[] columnValues =
                    Enumerable.Range(0, numFields)
                        .Select(i => reader.GetValue(i).GetType()==datetime_type?((DateTime) reader.GetValue(i)).ToString(format):(reader.GetValue(i).GetType() == byte_arr_type? String.Concat(Array.ConvertAll((byte[]) reader.GetValue(i), x => x.ToString("X2"))) :reader.GetValue(i).ToString()))
                        ///.Select(field => string.Concat("\"", field.Replace("\"", "\"\""), "\""))
                        .Select(field => field.Replace("\t", " "))
                                .ToArray();
                outFile.WriteLine(string.Join("\t", columnValues));
                if (ii % 100000 == 0)
                {
                    Console.WriteLine("row {0}", ii);
                }
            }
        }
    }
}
public static IEnumerable<string> GetColumnNames(IDataReader reader)
{
    foreach (DataRow row in reader.GetSchemaTable().Rows)
    {
        yield return (string)row["ColumnName"];
    }
}
Criticism answered 6/1, 2016 at 8:48 Comment(0)
M
0

FileHelpers has an async engine which is better suited for handling large files. Unfortunately, the FileDataLink class does not use it, so there's no easy way to use it with SqlStorage.

It's not very easy to modify the SQL timeout either. The easiest way would be to copy the code for SqlServerStorage to create your own alternative storage provider and provide replacements for ExecuteAndClose() and ExecuteAndLeaveOpen() which set the timeout on the IDbCommand. (SqlServerStorage is a sealed class, so you cannot just subclass it).

You might want to check out ReactiveETL which uses the FileHelpers async engine for handling files along with a rewrite of Ayende's RhinoETL using ReactiveExtensions to handle large datasets.

Mardellmarden answered 22/10, 2013 at 13:37 Comment(1)
This is what I was afraid of. The FileHelpers team should consider to add the Timeout property to the SqlServerStorage class. It would be great for FileHelpers to cover more use cases, IMO.Olla

© 2022 - 2024 — McMap. All rights reserved.