You asked for a VBA solution, and I detect a preference for not creating new Access objects; you may well have good reasons for that, but the 'pure' VBA solution is a lot of work.
A solution that implements encapsulating text fields in quotes is the bare minimum for a competent answer. After that, you need to address the three big issues:
- Optimising away VBA's clunky string-handling;
- The Byte Order Marker, which VBA embeds in every string it saves to
file, ensuring that some of the most common consumers of a csv file
cannot read it properly;
- ...And there's rarely any middle ground between writing the file
line-by-line, forever, and writing it in one big chunk that'll throw
an out-of-memory error on larger recordsets.
Beginners in VBA may find the string-optimisations difficult to understand: the biggest performance gain available in native VBA is to avoid string allocation and concatenation ( here's why: http://www.aivosto.com/vbtips/stringopt2.html#huge ) - so I use join, split, and replace instead of myString = MyString & MoreString
The trailing loop, with the RecordSet.GetRows()
call at the very end, will raise eyebrows among coders with strong opinions about structured programming: but there are constraints on how you can order the code so that the 'chunks' are concatenated into the file without any missed bytes, out-of-register shifts in the byte order, or blank lines.
So here goes:
Public Function RecordsetToCSV(ByRef rst As ADODB.Recordset, _
ByRef OutputFile As String, _
Optional ByRef FieldList As Variant, _
Optional ByVal CoerceText As Boolean = True, _
Optional ByVal CleanupText As Boolean = True _
) As Long
' Output a recordset to a csv file and returns the row count.
' If the output file is locked, or specified in an inaccessible location, the
' 'ByRef' OutputFile parameter becomes a file in the user's local temp folder
' You can supply your own field list. This isn't a substituted file header of
' aliased field names: it is a subset of the field names, which ADO will read
' selectively from the recordset. Each item in the list matches a named field
' CoerceText=TRUE will encapsulate all items, numeric or not, in quote marks.
' CleanupText=TRUE strips quotes and linefeeds from the data: FALSE is faster
' You should only set them FALSE if you're confident that the data is 'clean'
' with no quote marks, commas or line breaks in any unencapsulated text field
' This code handles unicode, and outputs a file that can be read by Microsoft
' ODBC and OLEDB database drivers by removing the Byte Order Marker.
On Error Resume Next
' Coding note: we're not doing any string-handling in VBA.Strings: allocating
' deallocating and (especially!) concatenating are SLOW. We are using the VBA
' Join and Split functions ONLY. Feel free to optimise further by declaring a
' faster set of string functions from the Kernel if you want to.
'
' Other optimisations: type pun. Byte Arrays are interchangeable with strings
' Some of our loops through these arrays have a 'step' of 2. This optimises a
' search-and-replace for ANSI chars in an array of 2-byte unicodes. Note that
' it's only used to remove known ANSI 'Latin' characters with a 'low' byte of
' zero: any other use of the two-byte 'step' will fail on non-Latin unicodes.
' ** THIS CODE IS IN THE PUBLIC DOMAIN **
' Nigel Heffernan Excellerando.Blogspot.com
Const FETCH_ROWS As Long = 4096
Dim COMMA As String * 1
Dim BLANK As String * 4
Dim EOROW As String * 2
COMMA = ChrW$(44)
BLANK = ChrW$(13) & ChrW$(10) & ChrW$(13) & ChrW$(10)
EOROW = ChrW$(13) & ChrW$(10)
Dim FetchArray As Variant
Dim i As Long ' i for rows in the output file, records in the recordset
Dim j As Long ' j for columns in the output file, fields in the recordset
Dim k As Long ' k for all other loops: bytes in individual data items
Dim i_Offset As Long
Dim i_LBound As Long
Dim i_UBound As Long
Dim j_LBound As Long
Dim j_UBound As Long
Dim k_lBound As Long
Dim k_uBound As Long
Dim hndFile As Long
Dim varField As Variant
Dim iRowCount As Long
Dim arrBytes() As Byte
Dim arrTemp1() As String
Dim arrTemp2() As String
Dim arrTemp3(0 To 2) As String
Dim boolNumeric As Boolean
Dim strHeader As String
Dim arrHeader() As Byte
Dim strFile As String
Dim strPath As String
Dim strExtn As String
strFile = FileName(OutputFile)
strPath = FilePath(OutputFile)
strExtn = FileExtension(strFile)
If rst Is Nothing Then Exit Function
If rst.State <> 1 Then Exit Function
If strExtn = "" Then
strExtn = ".csv"
End If
With FSO
If strFile = "" Then
strFile = .GetTempName
strFile = Left(strFile, Len(strFile) - Len(".tmp"))
strFile = strFile & strExtn
End If
If strPath = "" Then
strPath = TempSQLFolder
End If
If Right(strPath, 1) <> "\" Then
strPath = strPath & "\"
End If
strExtn = FileExtension(strFile)
If strExtn = "" Then
strExtn = ".csv"
strFile = strFile & strExtn
End If
OutputFile = strPath & strFile
End With
If FileName(OutputFile) <> "" Then
If Len(VBA.FileSystem.Dir(OutputFile, vbNormal)) <> 0 Then
Err.Clear
VBA.FileSystem.Kill OutputFile ' do it now, and reduce wait for deletion
If Err.Number = 70 Then ' permission denied: change the output file name
OutputFile = FileStripExtension(OutputFile) & "_" & FileStripExtension(FSO.GetTempName) & FileExtension(OutputFile)
End If
End If
End If
' ChrW$() gives a 2-byte 'Wide' char. This coerces all subsequent operations to UTF16
arrTemp3(0) = ChrW$(34) ' Encapsulating quote
arrTemp3(1) = vbNullString ' The field value will go here
arrTemp3(2) = ChrW$(34) ' Encapsulating quote
If rst.EOF And rst.BOF Then
FetchArray = Empty
ElseIf rst.EOF Then
rst.MoveFirst
End If
' An empty recordset must still write a header row of field names: we put this in the
' output buffer and write it to the file before we start looping through the records.
ReDim FetchArray(0 To rst.Fields.Count, 0 To 0)
i_LBound = 0
i_UBound = 0
If IsMissing(FieldList) Then
For j = LBound(FetchArray, 1) To UBound(FetchArray, 1) - 1 Step 1
FetchArray(j, i_UBound) = rst.Fields(j).Name
Next j
Else
j = 0
For Each varField In FieldList
j_UBound = j_UBound + 1
Next varField
ReDim arrTemp2(j_LBound To j_UBound)
For Each varField In FieldList
FetchArray(j, i_UBound) = CStr(varField)
j = j + 1
Next varField
End If
ReDim arrTemp1(i_LBound To i_UBound) ' arrTemp1 is the rowset we write to file
ReDim arrTemp2(j_LBound To j_UBound) ' arrTemp2 represents a single record
Do Until IsEmpty(FetchArray)
i_LBound = LBound(FetchArray, 2)
i_UBound = UBound(FetchArray, 2)
j_LBound = LBound(FetchArray, 1)
j_UBound = UBound(FetchArray, 1)
If UBound(arrTemp1) <> i_UBound + 1 Then
ReDim arrTemp1(i_LBound To i_UBound + 1)
arrTemp1(i_UBound + 1) = vbNullString ' The 'Join' operation will insert a trailing row
End If ' delimiter here (Not required by the last chunk)
If UBound(arrTemp2) <> j_UBound Then
ReDim arrTemp2(j_LBound To j_UBound)
End If
' Data body. This is heavily optimised to avoid VBA String functions with allocations
For i = i_LBound To i_UBound Step 1
' If this is confusing... Were you expecting FetchArray(i,j)? i for row, j for column?
' FetchArray comes from RecordSet.GetRows(), which returns a TRANSPOSED array: i and j
' are still the field and record ordinals, row(i) and column(j) in the output file.
For j = j_LBound To j_UBound
If IsNull(FetchArray(j, i)) Then
arrTemp2(j) = ""
Else
arrTemp2(j) = FetchArray(j, i) ' confused? see he note above
End If
If CleanupText Or (i_UBound = 0) Then ' (i_UBound=0): always clean up field names
arrBytes = arrTemp2(j) ' Integer arithmetic is faster than string-handling for
' this: all VBA string operations require an allocation
For k = LBound(arrBytes) To UBound(arrBytes) Step 2
Select Case arrBytes(k)
Case 10, 13, 9, 160
If arrBytes(k + 1) = 0 Then
arrBytes(k) = 32 ' replaces CR, LF, Tab, and non-breaking
End If ' spaces with the standard ANSI space
Case 44
If Not CoerceText Then
If arrBytes(k + 1) = 0 Then
arrBytes(k) = 32 ' replace comma with the ANSI space
End If
End If
Case 34
If arrBytes(k + 1) = 0 Then
arrBytes(k) = 39 ' replaces double-quote with single quote
End If
End Select
Next k
arrTemp2(j) = arrTemp2(j)
End If ' cleanup
If CoerceText Then ' encapsulate all fields in quotes, numeric or not
arrTemp3(1) = arrTemp2(j)
arrTemp2(j) = Join$(arrTemp3, vbNullString)
ElseIf (i = 0) And (i = i_UBound) Then ' always encapsulate field names
arrTemp3(1) = arrTemp2(j)
arrTemp2(j) = Join$(arrTemp3, vbNullString)
Else ' selective encapsulation, leaving numeric fields unencapsulated:
' we *could* do this by reading the ADODB field types: but that's
' slower, and you may be 'caught out' by provider-specific types.
arrBytes = arrTemp2(j)
boolNumeric = True
For k = LBound(arrBytes) To UBound(arrBytes) Step 2
If arrBytes(k) < 43 Or arrBytes(k) > 57 Then
If arrBytes(k) <> 69 Then
boolNumeric = False
Exit For
Else
If k > UBound(arrBytes) - 5 Then
boolNumeric = False
Exit For
ElseIf arrBytes(k + 2) = 45 Then
' detect "1.234E-05"
ElseIf arrBytes(k + 2) = 43 Then
' detect "1.234E+05"
Else
boolNumeric = False
Exit For
End If
End If
End If
Next k
If boolNumeric Then
For k = 1 + LBound(arrBytes) To UBound(arrBytes) Step 2
If arrBytes(k) <> 0 Then
boolNumeric = False
Exit For
End If
Next k
End If
arrBytes = vbNullString
If Not boolNumeric Then ' text field, encapsulate it
arrTemp3(1) = arrTemp2(j)
arrTemp2(j) = Join(arrTemp3, vbNullString)
End If
End If ' CoerceText
Next j
arrTemp1(i) = Join(arrTemp2, COMMA)
Next i
iRowCount = iRowCount + i - 2
' **** WHY WE 'PUT' A BYTE ARRAY INSTEAD OF A VBA STRING VARIABLE **** ****
'
' Put #hndFile, , StrConv(Join(arrTemp1, EOROW), vbUnicode)
' Put #hndFile, , Join(arrTemp1, EOROW)
'
' If you pass unicode, Wide or UTF-16 string variables to PUT, it prepends a
' Unicode Byte Order Mark to the data which, when written to your file, will
' render the field names illegible to Microsoft's JET ODBC and ACE-OLEDB SQL
' drivers (which can actually read unicode field names, if the helpful label
' isn't in the way). The primeval 'PUT' statement writes a Byte array as-is.
'
' **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
arrBytes = Join$(arrTemp1, vbCrLf)
If hndFile = 0 Then
i_Offset = 1
If Len(Dir(OutputFile)) > 0 Then
VBA.FileSystem.Kill OutputFile
End If
WaitForFileDeletion OutputFile
hndFile = FreeFile
Open OutputFile For Binary Access Write As #hndFile
End If
Put #hndFile, i_Offset, arrBytes
i_Offset = i_Offset + 1 + UBound(arrBytes)
Erase arrBytes
If rst.EOF Then
Erase FetchArray
FetchArray = Empty
Else
If IsMissing(FieldList) Then
FetchArray = rst.GetRows(FETCH_ROWS)
Else
FetchArray = rst.GetRows(FETCH_ROWS, , FieldList)
End If
End If
Loop ' until isempty(FetchArray)
If iRowCount < 1 Then '
iRowCount = 0 ' Row Count excludes the header
End If
RecordsetToCSV = iRowCount
ExitSub:
On Error Resume Next
If hndFile <> 0 Then
Close #hndFile
End If
Erase arrBytes
Erase arrTemp1
Erase arrTemp2
Exit Function
ErrSub:
Resume ExitSub
End Function
Public Function FilePath(Path As String) As String
' Strip the filename from a path, leaving only the path to the folder
' The last char of this path will be the backslash
' This does not check for the existence or accessibility of the file:
' all we're doing here is string-handling
Dim strPath As String
Dim arrPath() As String
Const BACKSLASH As String * 1 = "\"
strPath = Trim(Path)
If strPath = "" Then Exit Function
If Right$(strPath, 1) = BACKSLASH Then Exit Function
arrPath = Split(strPath, BACKSLASH)
If UBound(arrPath) = 0 Then ' does not contain "\"
FilePath = ""
Else
arrPath(UBound(arrPath)) = vbNullString
FilePath = Join$(arrPath, BACKSLASH)
End If
Erase arrPath
End Function
Public Function FileName(Path As String) As String
' Strip the folder and path from a file's path string, leaving only the file name
' This does not check for the existence or accessibility of the file:
' all we're doing here is string-handling
Dim strPath As String
Dim arrPath() As String
Const BACKSLASH As String * 1 = "\"
strPath = Trim(Path)
If strPath = "" Then Exit Function
If Right$(strPath, 1) = BACKSLASH Then Exit Function
arrPath = Split(strPath, BACKSLASH)
If UBound(arrPath) = 0 Then ' does not contain "\"
FileName = Path
Else
FileName = arrPath(UBound(arrPath))
End If
Erase arrPath
End Function
Public Function FileExtension(Path As String) As String
' Return the extension of the file
' This is just string-handling: no file or path validation is attempted
' The file extension is deemed to be whatever comes after the final '.'
' The extension is returned with the dot, eg: ".txt" not "txt"
' If no extension is detected, FileExtension returns an empty string
Dim strFile As String
Dim arrFile() As String
Const DOT_EXT As String * 1 = "."
strFile = FileName(Path)
strFile = Trim(strFile)
If strFile = "" Then Exit Function
If Right$(strFile, 1) = DOT_EXT Then Exit Function
arrFile = Split(strFile, DOT_EXT)
If UBound(arrFile) = 0 Then ' does not contain "\"
FileExtension = vbNullString
Else
FileExtension = arrFile(UBound(arrFile))
FileExtension = Trim(FileExtension)
If Len(FileExtension) > 0 Then
FileExtension = DOT_EXT & FileExtension
End If
End If
Erase arrFile
End Function
Public Function FileStripExtension(Path As String) As String
' Return the filename, with the extension removed
' This is just string-handling: no file validation is attempted
' The file extension is deemed to be whatever comes after the final '.'
' Both the dot and the extension are removed
Dim strFile As String
Dim arrFile() As String
Const DOT_EXT As String * 1 = "."
strFile = FileName(Path)
If strFile = "" Then Exit Function
If Right$(strFile, 1) = DOT_EXT Then Exit Function
strFile = Trim(strFile)
arrFile = Split(strFile, DOT_EXT)
If UBound(arrFile) = 0 Then ' does not contain "\"
FileStripExtension = vbNullString
Else
ReDim Preserve arrFile(LBound(arrFile) To UBound(arrFile) - 1)
FileStripExtension = Join$(arrFile, DOT_EXT)
End If
Erase arrFile
End Function
You'll also need the three path-and-file-name utility functions, if you don't have your own versions already:
- FileName()
- FilePath()
- FileStripExtension()
There's room for improvement in the string-encapsulation logic: the correct approach is to look up the recordset's field types and apply quote marks accordingly, and it may well turn out to be faster than my clunky byte-counting.
However, my approach is all about the file consumers and what they expect to see; and that doesn't always line up with what they ought to accept.
If you succeed in coding a faster and more robust version do, please, let me know: if I'm asked to, I may well code up encapsulation by field type myself.