CSV.foreach Not Reading First Column in CSV File
Asked Answered
M

2

13

Learning Ruby for the first time to automate cleaning up some CSV files.

I've managed to piece together the script below from other SO questions but for some reason the script does not read the first column of the original CSV file. If I add a dummy first column everything works perfectly. What am I missing?

require 'csv'

COLUMNS = ['SFID','Date','Num','Transaction Type']

CSV.open("invoicesfixed.csv", "wb",
  :write_headers=> true,
  :headers => ["Account__c","Invoice_Date__c","Invoice_Number__c","Transaction_Type__c"]) do |csv|

  CSV.foreach('invoices.csv', :headers=>true, :converters => :all) do |row|


    #convert date format to be compatible with Salesforce
    row['Date'] = Date.strptime(row['Date'], '%m/%d/%y').strftime('%Y-%m-%d')
    csv << COLUMNS.map { |col| row[col] }

  end
end

This input file:

Transaction Type,Date,Num,SFID
Invoice,7/1/19,151466,SFID1
Invoice,7/1/19,151466,SFID2
Invoice,7/1/19,151466,SFID3
Invoice,7/1/19,151466,SFID4
Invoice,7/1/19,151466,SFID5
Invoice,7/1/19,151466,SFID6
Invoice,7/1/19,151153,SFID7
Sales Receipt,7/1/19,149487,SFID8
Sales Receipt,7/1/19,149487,SFID9
Sales Receipt,7/1/19,149758,SFID10
Sales Receipt,7/1/19,149758,SFID11

Yields this output:

Account__c,Invoice_Date__c,Invoice_Number__c,Transaction_Type__c
SFID1,2019-07-01,151466,
SFID2,2019-07-01,151466,
SFID3,2019-07-01,151466,
SFID4,2019-07-01,151466,
SFID5,2019-07-01,151466,
SFID6,2019-07-01,151466,
SFID7,2019-07-01,151153,
SFID8,2019-07-01,149487,
SFID9,2019-07-01,149487,
SFID10,2019-07-01,149758,
SFID11,2019-07-01,149758,

However, this input:

Dummy,Transaction Type,Date,Num,SFID
,Invoice,7/1/19,151466,SFID1
,Invoice,7/1/19,151466,SFID2
,Invoice,7/1/19,151466,SFID3
,Invoice,7/1/19,151466,SFID4
,Invoice,7/1/19,151466,SFID5
,Invoice,7/1/19,151466,SFID6
,Invoice,7/1/19,151153,SFID7
,Sales Receipt,7/1/19,149487,SFID8
,Sales Receipt,7/1/19,149487,SFID9
,Sales Receipt,7/1/19,149758,SFID10
,Sales Receipt,7/1/19,149758,SFID11

Yields the correct output of:

Account__c,Invoice_Date__c,Invoice_Number__c,Transaction_Type__c
SFID1,2019-07-01,151466,Invoice
SFID2,2019-07-01,151466,Invoice
SFID3,2019-07-01,151466,Invoice
SFID4,2019-07-01,151466,Invoice
SFID5,2019-07-01,151466,Invoice
SFID6,2019-07-01,151466,Invoice
SFID7,2019-07-01,151153,Invoice
SFID8,2019-07-01,149487,Sales Receipt
SFID9,2019-07-01,149487,Sales Receipt
SFID10,2019-07-01,149758,Sales Receipt
SFID11,2019-07-01,149758,Sales Receipt

Any ideas why this might be happening?

Mudskipper answered 8/8, 2019 at 18:26 Comment(5)
Worked on my machine.Franglais
@SergioTulentsev well dangit, ha. Thanks for checking!Mudskipper
I ran this locally, too, and like @SergioTulentsev, the first input file generated the expected output. "Invoice" and "Sales Receipt" were not truncated off the rowsCarnage
@MasonStewart Thank you for verifying as well! I'll try to figure this out on my end.Mudskipper
did you find the solution? I am running into same issue with my code. But your example works on my machine :/Vaughan
V
30

I had a similar problem, though running your example worked. I realized that problem (at least for me) was that I was creating CSV file using "Save As UTF-8 CSV" from Excel.

This adds BOM to the beginning of the file - before the first column header name and consequently row['firstColumnName'] was returning nil.

Saving file as CSV fixed the issue for me.

Vaughan answered 18/8, 2019 at 0:7 Comment(4)
Thanks @Milan, that sounds like it will be the correct answer. I'll have some new invoices to process shortly and will test this theory.Mudskipper
OMG, I love you :) This was driving me up the wall. I thought I was going insane!Certified
This absolutely solved my issue, many thanks! Good to know about BOM too.Fresno
I saved as UTF-8 with Excel for Mac it was doing this. I had to save the same file as a CSV with Numbers in order to fix it.Terrill
T
0
CSV.foreach(file.path, headers: true, col_sep: ';', header_converters: ->(h) { h.gsub(/\W/, '') }) do |row| 
    changes << row.to_h.slice('school_key_number', *UPDATABLE_FIELDS) 
end

Remove trash symbols from headers using REGEX and header_converters

Train answered 18/7 at 11:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.