fastest way to export blobs from table into individual files
Asked Answered
A

5

38

What is the fastest way to export files (blobs) stored in a SQL Server table into a file on the hard drive? I have over 2.5 TB of files (90 kb avg) stored as varbinary and I need to extract each one to a local hard drive as quickly as possible. BCP seems to work but it will take over 45 days with the speed I'm seeing, and I'm worried that my script will fail at some point because Management Studio will run out of memory.

Avarice answered 25/4, 2012 at 23:42 Comment(3)
This is unlikely to be a BCP performance problem. What do your disk utilizations look like during this?Stiles
Average disk queue length is below a tenth of a second and average response time is below 5 ms during my testing, which seem good to me. Anyway it seems like reasonable performance from BCP, I was just hoping there might be a faster way.Avarice
Might want to read the following question on DBA.SE: Optimising BCP performance for BLOB data. It treats the case of bcp and blob.Shantishantung
A
37

I tried using a CLR function and it was more than twice as fast as BCP. Here's my code.

Original Method:

SET @bcpCommand = 'bcp "SELECT blobcolumn FROM blobtable WHERE ID = ' + CAST(@FileID AS VARCHAR(20)) + '" queryout "' + @FileName + '" -T -c'
EXEC master..xp_cmdshell @bcpCommand

CLR Method:

declare @file varbinary(max) = (select blobcolumn from blobtable WHERE ID = @fileid)
declare @filepath nvarchar(4000) = N'c:\temp\' + @FileName
SELECT Master.dbo.WriteToFile(@file, @filepath, 0)

C# Code for the CLR function

using System;
using System.Data;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.SqlServer.Server;

namespace BlobExport
{
    public class Functions
    {
      [SqlFunction]
      public static SqlString WriteToFile(SqlBytes binary, SqlString path, SqlBoolean append)
      {        
        try
        {
          if (!binary.IsNull && !path.IsNull && !append.IsNull)
          {         
            var dir = Path.GetDirectoryName(path.Value);           
            if (!Directory.Exists(dir))              
              Directory.CreateDirectory(dir);            
              using (var fs = new FileStream(path.Value, append ? FileMode.Append : FileMode.OpenOrCreate))
            {
                byte[] byteArr = binary.Value;
                for (int i = 0; i < byteArr.Length; i++)
                {
                    fs.WriteByte(byteArr[i]);
                };
            }
            return "SUCCESS";
          }
          else
             "NULL INPUT";
        }
        catch (Exception ex)
        {          
          return ex.Message;
        }
      }
    }
}
Avarice answered 27/4, 2012 at 17:54 Comment(8)
I doubled performance again by using SET NOCOUNT ON and sending the result text to a table rather than the Messages window in SSMS.Avarice
Would be nice if T-SQL simply had MySQL's SELECT field INTO DUMPFILE command.Chaille
It does. It's called BULK INSERT.Shantishantung
Your original method looked easier for my use, so I tried it and my .jpeg files were corrupt. Changing the -c switch to -n (character to native) fixed that. After figuring that out, I have a use for the CLR version after all, though, so thanks!Alcott
@Marian: BULK INSERT imports a data file into a database table or view. It can't export data to files.Overdevelop
FYI: The file 'corruption' occurs because the -n format prepends the file with 4 hex words which are used to show the length of the blob in the database. If you strip those off the front in something like a hex editor, the file loads fine. Now if I could just find the parameter which would strip those off automatically...Caffeine
@influent, about your original method, where from you get @FileID?Driveway
I think I looped through the table. Whatever I did had no bearing on performance.Avarice
P
23

I came here looking for exporting blob into file with least effort. CLR functions is not something what I'd call least effort. Here described lazier one, using OLE Automation:

declare @init int
declare @file varbinary(max) = CONVERT(varbinary(max), N'your blob here')
declare @filepath nvarchar(4000) = N'c:\temp\you file name here.txt'

EXEC sp_OACreate 'ADODB.Stream', @init OUTPUT; -- An instace created
EXEC sp_OASetProperty @init, 'Type', 1; 
EXEC sp_OAMethod @init, 'Open'; -- Calling a method
EXEC sp_OAMethod @init, 'Write', NULL, @file; -- Calling a method
EXEC sp_OAMethod @init, 'SaveToFile', NULL, @filepath, 2; -- Calling a method
EXEC sp_OAMethod @init, 'Close'; -- Calling a method
EXEC sp_OADestroy @init; -- Closed the resources

You'll potentially need to allow to run OA stored procedures on server (and then turn it off, when you're done):

sp_configure 'show advanced options', 1;  
GO  
RECONFIGURE;  
GO  
sp_configure 'Ole Automation Procedures', 1;  
GO  
RECONFIGURE;  
GO
Particularize answered 6/7, 2017 at 20:41 Comment(3)
I just used this method to generate 979 jpegs from a varbinary column of ID badge photos into a network drive. It took approx 16 seconds to generate the images (12mb in total) and 5 minutes to write the cursor and slightly modify the code provided. The other answer's methods would have taken far longer, but may be more efficient. This approach was perfect for my needs.Valentinevalentino
I was using BCP in my script and it took almost 4 times more time than this method. +1Dorotea
Still worked well for me today. I also have a small enough batch there's no need for something faster. ~12,000 small jpeg's (50kb or less) in less than 5 minutes. SQL Server 14, Express. ------ Do note you may have to set file permissions on target folder. It failed without notice until I gave it a folder with Everyone => Write.Exudation
G
4

For me what worked by combining all the posts I have read is:

1.Enable OLE automation - if not enabled

sp_configure 'show advanced options', 1;  
GO  
RECONFIGURE;  
GO  
sp_configure 'Ole Automation Procedures', 1;  
GO  
RECONFIGURE;  
GO

2.Create a folder where the generated files will be stored:

C:\GREGTESTING

3.Create DocTable that will be used for file generation and store there the blobs in Doc_Content
enter image description here

CREATE TABLE [dbo].[Document](
    [Doc_Num] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
    [Extension] [varchar](50) NULL,
    [FileName] [varchar](200) NULL,
    [Doc_Content] [varbinary](max) NULL   
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] 

INSERT [dbo].[Document] ([Extension] ,[FileName] , [Doc_Content] )
    SELECT 'pdf', 'SHTP Notional hire - January 2019.pdf', 0x....(varbinary blob)

Important note!

Don't forget to add in Doc_Content column the varbinary of file you want to generate!

4.Run the below script

DECLARE @outPutPath varchar(50) = 'C:\GREGTESTING'
, @i bigint
, @init int
, @data varbinary(max) 
, @fPath varchar(max)  
, @folderPath  varchar(max)

--Get Data into temp Table variable so that we can iterate over it 
DECLARE @Doctable TABLE (id int identity(1,1), [Doc_Num]  varchar(100) , [FileName]  varchar(100), [Doc_Content] varBinary(max) )



INSERT INTO @Doctable([Doc_Num] , [FileName],[Doc_Content])
Select [Doc_Num] , [FileName],[Doc_Content] FROM  [dbo].[Document]



SELECT @i = COUNT(1) FROM @Doctable   

WHILE @i >= 1   

BEGIN    

SELECT 
    @data = [Doc_Content],
    @fPath = @outPutPath + '\' + [Doc_Num] +'_' +[FileName],
    @folderPath = @outPutPath + '\'+ [Doc_Num]
FROM @Doctable WHERE id = @i

EXEC sp_OACreate 'ADODB.Stream', @init OUTPUT; -- An instace created
EXEC sp_OASetProperty @init, 'Type', 1;  
EXEC sp_OAMethod @init, 'Open'; -- Calling a method
EXEC sp_OAMethod @init, 'Write', NULL, @data; -- Calling a method
EXEC sp_OAMethod @init, 'SaveToFile', NULL, @fPath, 2; -- Calling a method
EXEC sp_OAMethod @init, 'Close'; -- Calling a method
EXEC sp_OADestroy @init; -- Closed the resources
print 'Document Generated at - '+  @fPath   

--Reset the variables for next use
SELECT @data = NULL  
, @init = NULL
, @fPath = NULL  
, @folderPath = NULL
SET @i -= 1
END   

5.The results is shown below: enter image description here

Glomerulus answered 23/10, 2020 at 7:40 Comment(1)
Thanks gmavridakis, But I want to export file as .docx. File created properly but I cannot open it. Have you any idea ? It will be helpful for me.Cornelia
A
0
#region Help:  Introduction to the script task
/* The Script Task allows you to perform virtually any operation that can be accomplished in
 * a .Net application within the context of an Integration Services control flow. 
 * 
 * Expand the other regions which have "Help" prefixes for examples of specific ways to use
 * Integration Services features within this script task. */
#endregion


#region Namespaces
using System;
using System. Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using System.Data.OleDb;
using System.Data.SqlClient;
using System.Data.SqlTypes;
#endregion

namespace ST_d4b314af29e4467f9d040056aad2fa70
{
    /// <summary>
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// </summary>
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region Help:  Using Integration Services variables and parameters in a script
        /* To use a variable in this script, first ensure that the variable has been added to 
         * either the list contained in the ReadOnlyVariables property or the list contained in 
         * the ReadWriteVariables property of this script task, according to whether or not your
         * code needs to write to the variable.  To add the variable, save this script, close this instance of
         * Visual Studio, and update the ReadOnlyVariables and 
         * ReadWriteVariables properties in the Script Transformation Editor window.
         * To use a parameter in this script, follow the same steps. Parameters are always read-only.
         * 
         * Example of reading from a variable:
         *  DateTime startTime = (DateTime) Dts.Variables["System::StartTime"].Value;
         * 
         * Example of writing to a variable:
         *  Dts.Variables["User::myStringVariable"].Value = "new value";
         * 
         * Example of reading from a package parameter:
         *  int batchId = (int) Dts.Variables["$Package::batchId"].Value;
         *  
         * Example of reading from a project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].Value;
         * 
         * Example of reading from a sensitive project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].GetSensitiveValue();
         * */

        #endregion

        #region Help:  Firing Integration Services events from a script
        /* This script task can fire events for logging purposes.
         * 
         * Example of firing an error event:
         *  Dts.Events.FireError(18, "Process Values", "Bad value", "", 0);
         * 
         * Example of firing an information event:
         *  Dts.Events.FireInformation(3, "Process Values", "Processing has started", "", 0, ref fireAgain)
         * 
         * Example of firing a warning event:
         *  Dts.Events.FireWarning(14, "Process Values", "No values received for input", "", 0);
         * */
        #endregion

        #region Help:  Using Integration Services connection managers in a script
        /* Some types of connection managers can be used in this script task.  See the topic 
         * "Working with Connection Managers Programmatically" for details.
         * 
         * Example of using an Adnet connection manager:
         *  object rawConnection = Dts.Connections["Sales DB"].AcquireConnection(Dts.Transaction);
         *  SqlConnection myADONETConnection = (SqlConnection)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Sales DB"].ReleaseConnection(rawConnection);
         *
         * Example of using a File connection manager
         *  object rawConnection = Dts.Connections["Prices.zip"].AcquireConnection(Dts.Transaction);
         *  string filePath = (string)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Prices.zip"].ReleaseConnection(rawConnection);
         * */
        #endregion


        /// <summary>
        /// This method is called when this script task executes in the control flow.
        /// Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
        /// To open Help, press F1.
        /// </summary>
        public void Main()
        {
            FileStream fs;
            BinaryWriter bw;
            int bufferSize = 100;
            byte[] outbyte = new byte[bufferSize];
            long retval;
            long startIndex = 0;
            String Path = "";
            string filename = "";
            SqlBytes Binary;
            
            SqlConnection cn = new SqlConnection("Data Source=DESKTOP-3D2QK6P;Initial Catalog=Test;Integrated Security=SSPI;");
            SqlCommand logoCMD = new SqlCommand("SELECT Folder_name,FileName,Doc_Content FROM dbo.document", cn);
            cn.Open();
            SqlDataReader myReader = logoCMD.ExecuteReader(CommandBehavior.Default);
           

            while (myReader.Read())
            {
                Path = myReader.GetString(0);
                filename = myReader.GetString(1);
                Binary = myReader.GetSqlBytes(2);
                
                if (!Binary.IsNull && Path!="" && filename!="")
                {
                    var dir = System.IO.Path.GetDirectoryName(Path);
                    if (!Directory.Exists(dir))
                    {
                        Directory.CreateDirectory(dir);
                    }
                    
                        fs = new FileStream(Path+"\\"+filename+".pdf", FileMode.OpenOrCreate, FileAccess.Write);
                        bw = new BinaryWriter(fs);
                        startIndex = 0;
                        retval = myReader.GetBytes(2, startIndex, outbyte, 0, bufferSize);
                        while (retval == bufferSize)
                        {
                            bw.Write(outbyte);
                            bw.Flush();

                            // Reposition the start index to the end of the last buffer and fill the buffer.
                            startIndex += bufferSize;
                            retval = myReader.GetBytes(2, startIndex, outbyte, 0, bufferSize);
                        }

                        // Write the remaining buffer.
                        bw.Write(outbyte, 0, (int)retval - 1);
                        bw.Flush();

                        // Close the output file.
                        bw.Close();
                        fs.Close();
                    }
                

                
            }
            // Close the reader and the connection.
            myReader.Close();
            cn.Close();
        }

        

        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

    }
}
Aileenailene answered 11/12, 2022 at 5:12 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Rubadub
R
0

I was looking for a method to export a few thousand images to E:\StudentPhotosNew with each one being StudentRef.jpg where it could run it each night and only export those not already exported and avoid using a cursor so I have created a script based on the examples above and for me it runs in 22 seconds the first time and is then quicker after this due to skipping existing files.

In this example I am exporting student photos but if you amend the #Files temp table query to suit your table structure then the images could be anything as I kept the rest of the code generic.

SET XACT_ABORT, NOCOUNT ON;

DECLARE @Message NVARCHAR(MAX);
SET @Message = N'Starting Operation at ' + FORMAT ( SYSDATETIME(), 'dd/MM/yyyy HH:mm:ss' );
RAISERROR ( @Message, 0, 1 ) WITH NOWAIT;

DECLARE @AcademicYear VARCHAR(5) = '23/24'
DECLARE @FolderPath VARCHAR(250) = 'E:\StudentPhotosNew'


DECLARE @FileID INT
DECLARE @FileData VARBINARY(MAX)
DECLARE @FilePath VARCHAR(MAX)


DECLARE @CheckExistingFile INT
DECLARE @ExistingFile INT
DECLARE @FileObject INT

DECLARE @NewFilesExported INT = 0
DECLARE @ExistingFilesAlreadyExported INT = 0

--Temporary table to hold list of files to export
DROP TABLE IF EXISTS #Files
SELECT
    FileID =
        ROW_NUMBER () OVER (
            ORDER BY
                SD.RefNo
        ),
    FileName = RTRIM ( SD.RefNo ) + '.jpg',
    FileData = PHO.Photo,
    FilePath = 
        @FolderPath
        + CASE
            WHEN RIGHT ( @FolderPath, 1 ) = '\' THEN ''
            ELSE '\'
        END
        + RTRIM ( SD.RefNo ) + '.jpg',
    IsExported = CAST ( 0 AS BIT )
    INTO #Files
FROM StudentDetail SD
INNER JOIN StudentPhoto PHO
    ON PHO.StudentID = SD.StudentID
WHERE
    SD.AcademicYearID = @AcademicYear


--Check for records still to export and export 1 by 1
WHILE (
    SELECT
        RecordsStillToExport = COUNT ( F.FileID )
    FROM #Files F
    WHERE
        F.IsExported = 0
) > 0
BEGIN
    SELECT
        @FileID = F.FileID,
        @FileData = F.FileData,
        @FilePath = F.FilePath
    FROM #Files F
    WHERE
        F.FileID = ( SELECT MIN ( F.FileID ) FROM #Files F WHERE F.IsExported = 0 )


    --Check if file exists already - @CheckExistingFile will be 0
    EXEC @CheckExistingFile = sp_OACreate 'Scripting.FileSystemObject', @FileObject OUTPUT
    EXEC @CheckExistingFile = sp_OAMethod @FileObject, 'GetFile', @ExistingFile OUT, @FilePath
    EXEC sp_OADestroy @FileObject; -- Destroy Object
    
    --Export file if not already exported
    IF @CheckExistingFile <> 0
    BEGIN
        EXEC sp_OACreate 'ADODB.Stream', @FileObject OUTPUT; -- Create Object
        EXEC sp_OASetProperty @FileObject, 'Type', 1;
        EXEC sp_OAMethod @FileObject, 'Open';
        EXEC sp_OAMethod @FileObject, 'Write', NULL, @FileData;
        EXEC sp_OAMethod @FileObject, 'SaveToFile', NULL, @FilePath, 2;
        EXEC sp_OAMethod @FileObject, 'Close';
        EXEC sp_OADestroy @FileObject; -- Destroy Object

        SET @NewFilesExported += 1
    END
    ELSE
    BEGIN
        SET @ExistingFilesAlreadyExported += 1
    END

    UPDATE F
    SET
        F.IsExported = 1
    FROM #Files F
    WHERE
        F.FileID = @FileID
END

SET @Message = CAST ( @NewFilesExported AS NVARCHAR(50) ) + N' New Files Exported' + CHAR(13) + CHAR(10) + CAST ( @ExistingFilesAlreadyExported AS NVARCHAR(50) ) + N' Existing Files Already Exported';
RAISERROR ( @Message, 0, 1 ) WITH NOWAIT;

SET @Message = N'Completing Operation at ' + FORMAT ( SYSDATETIME(), 'dd/MM/yyyy HH:mm:ss' );
RAISERROR ( @Message, 0, 1 ) WITH NOWAIT;
Reproach answered 18/4, 2024 at 16:25 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.