Removing whitespaces in a CSV file

Asked 21/1, 2013 at 15:31 Answered 7/4, 2013 at 8:44

I have a string with extra whitespace:

First,Last,Email  ,Mobile Phone ,Company,Title  ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type

I want to parse this line and remove the whitespaces.

My code looks like:

namespace :db do
task :populate_contacts_csv => :environment do

require 'csv'

csv_text = File.read('file_upload_example.csv')
  csv = CSV.parse(csv_text, :headers => true)
    csv.each do |row|
      puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
    end
  end
end

Semifinalist answered 21/1, 2013 at 15:31 Comment(0)

You can strip your hash first:

csv.each do |unstriped_row|
  row = {}
  unstriped_row.each{|k, v| row[k.strip] = v.strip}
  puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end

Edited to strip hash keys too

Questionary answered 21/1, 2013 at 16:6 Comment(1)

This won't work. The header for "Email" isn't "Email" causing the code to not find a value. – Cobaltic 21/1, 2013 at 17:26

@prices = CSV.parse(IO.read('prices.csv'), :headers=>true, 
   :header_converters=> lambda {|f| f.strip},
   :converters=> lambda {|f| f ? f.strip : nil})

The nil test is added to the row but not header converters assuming that the headers are never nil, while the data might be, and nil doesn't have a strip method. I'm really surprised that, AFAIK, :strip is not a pre-defined converter!

Jocelyn answered 7/4, 2013 at 8:44 Comment(3)

Instead of using syntax lambda {|f| f.strip}, you can use the -> syntax like: ->(f) {f.strip} – Pestilence 24/11, 2014 at 19:26

This one was the solution for my problem, also works on CSV.new, not just CSV.parse. – Soggy 28/8, 2018 at 13:10

also from ruby 2.3, ->(f) { f&.strip } – Windproof 26/2, 2020 at 11:48

You can strip your hash first:

csv.each do |unstriped_row|
  row = {}
  unstriped_row.each{|k, v| row[k.strip] = v.strip}
  puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end

Edited to strip hash keys too

Questionary answered 21/1, 2013 at 16:6 Comment(1)

This won't work. The header for "Email" isn't "Email" causing the code to not find a value. – Cobaltic 21/1, 2013 at 17:26

CSV supports "converters" for the headers and fields, which let you get inside the data before it's passed to your each loop.

Writing a sample CSV file:

csv = "First,Last,Email  ,Mobile Phone ,Company,Title  ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
first,last,email  ,mobile phone ,company,title  ,street,city,state,zip,country, birthday,gender ,contact type
"
File.write('file_upload_example.csv', csv)

Here's how I'd do it:

require 'csv'
csv = CSV.open('file_upload_example.csv', :headers => true)
[:convert, :header_convert].each { |c| csv.send(c) { |f| f.strip } }

csv.each do |row|
  puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end

Which outputs:

First Name: 'first'
Last Name: 'last'
Email: 'email'

The converters simply strip leading and trailing whitespace from each header and each field as they're read from the file.

Also, as a programming design choice, don't read your file into memory using:

csv_text = File.read('file_upload_example.csv')

Then parse it:

csv = CSV.parse(csv_text, :headers => true)

Then loop over it:

csv.each do |row|

Ruby's IO system supports "enumerating" over a file, line by line. Once my code does CSV.open the file is readable and the each reads each line. The entire file doesn't need to be in memory at once, which isn't scalable (though on new machines it's becoming a lot more reasonable), and, if you test, you'll find that reading a file using each is extremely fast, probably equally fast as reading it, parsing it then iterating over the parsed file.

Cobaltic answered 21/1, 2013 at 17:14 Comment(0)

Recommended topics

Hot tags