How do I convert a date into a time when parsing an .xls doc using Rails?
Asked Answered
A

4

17

I'm using Rails 5. I want to parse an .xls (not to be confused with .xlsx doc) using the code below

  book = Roo::Spreadsheet.open(file_location)
  sheet = book.sheet(0)
  text = sheet.to_csv
  csv = CSV.parse(text)

  arr_of_arrs = csv
  text_content = ""
  arr_of_arrs.each do |arr|
    arr.map!{|v| v && v.to_f < 1 && v.to_f > 0 ? TimeFormattingHelper.time_as_str(v.to_f * 24 * 3600 * 1000) : v}
    text_content = "#{text_content}\n#{arr.join("\t")}"
  end

Here is the method I reference above

  def time_as_str(time_in_ms)
    regex = /^(0*:?)*0*/
    Time.at(time_in_ms.to_f/1000).utc.strftime("%H:%M:%S.%1N").sub!(regex, '')
  end

One area I'm having trouble is that a cell that appears in my .xls doc as

24:08:00

is processed as

1904-01-02T00:08:00+00:00

with the code above. How do I parse the value I see on the screen? That is, how do I convert the date value into a time value?

As an example from another Excel doc, the cell that appears as

24:02:00

is getting parsed by my code above as

1899-12-31T00:02:00+00:00
Archaeozoic answered 3/7, 2017 at 20:40 Comment(8)
Can you also put a sample of few rows from your Excel sheet that you are trying to parse?Edvard
Sure, you mean how the cell appears visually or how the "arr" varaible gets set?Archaeozoic
Yeah, what is the value in the xls cell and how exactly do you want to see it?Edvard
When I open the document in Microsoft Excel or OpenOffice, the cell visually appears as "24:08:00" (this also seems to be the value when I put the cursor over the cell and its value appears in that editable box at the top). That is how I would like to process it, but instead when I try and process it with my code above, it is getting processed as "1904-01-02T00:08:00+00:00". Let me know if that answers your question.Archaeozoic
'24:08:00" is a date? Which year is it?Orelee
No "24:08:00" is a duration, 24 hours and 8 minutes. For some reason, though, when I parse the Excel file using the code I posted, the result is "1904-01-02T00:08:00+00:00".Archaeozoic
@Orelee is 24:08 intended as 00:08 or can you write values like 35:12:00 as in "35 hours and 12 minute" (duration)?Bookkeeping
How are you getting Roo to parse .xls? Its does not handle this format. Forcing it to use the xlsx parser on xls data would be a mistake. Could you please post a sample data file somewhere? I think the proposals to "unparse" the erroneous dates to get durations are the wrong approach. But to figure out what's going on, need enough info to reproduce what's actually happening.Cudbear
S
3

It seems your .xls is in the 1904 date system, and Roo is not able to distinguish between what is a Duration and what is a DateTime, so you'll need to subtract the base date 1904-01-01 to the cell value. Weirdly enough, in case of the 1900 date system, you need to subtract the base date 1899-12-30, due to a bug in Lotus 1-2-3 that Microsoft replicated in Excel for compatibility.

Here is a method that converts the DateTime read from the spreadsheet into the duration according to the base date:

def duration_as_str(datetime, base_date)
  total_seconds = DateTime.parse(datetime).to_i - base_date.to_i
  hours = total_seconds / (60 * 60)
  minutes = (total_seconds / 60) % 60
  seconds = total_seconds % 60
  "%d:%02d:%02d" % [hours, minutes, seconds]
end

Let's test it:

irb(main):019:0> duration_as_str("1904-01-02T00:08:00+00:00", DateTime.new(1904, 1, 1))
=> "24:08:00"
irb(main):020:0> duration_as_str("1899-12-31T00:02:00+00:00", DateTime.new(1899, 12, 30))
=> "24:02:00"

You can use book.workbook.date_base.year to determine the spreadsheet's date system, and then just add another map inside your each loop:

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
csv = CSV.parse(text)

base_date = book.workbook.date_base.year == 1904 ? DateTime.new(1904, 1, 1) : DateTime.new(1899, 12, 30)
arr_of_arrs = csv
text_content = ""
arr_of_arrs.each do |arr|
  arr.map!{|v| v && v.to_f < 1 && v.to_f > 0 ? TimeFormattingHelper.time_as_str(v.to_f * 24 * 3600 * 1000) : v}
  arr.map!{|v| v =~ /^(1904|1899)-/ ? duration_as_str(v, base_date) : v}
  text_content = "#{text_content}\n#{arr.join("\t")}"
end
Speedboat answered 3/8, 2017 at 7:49 Comment(4)
Thanks. There's still a couple of dots I'm not connecting. Where in my code do I apply your function and how do I figure out when to apply it?Archaeozoic
@Archaeozoic See my updated answer. I also changed duration_as_str slightly, because after you convert it to CSV the cell value becomes a String and is not a DateTime anymore.Speedboat
Thanks. So in another Excel doc, the value "24:02:00" gets returned by my code as "1899-12-31T00:02:00+00:00". Is there a way to take what you have and make it more generic?Archaeozoic
@Archaeozoic I've generalized my solution to be compatible with both 1904 and 1900 date systems supported by Excel.Speedboat
E
1

You could use something like the below and write a custom parser for that string.

duration = 0

"24:08:01".split(":").each_with_index do |value, i|
  if i == 0
    duration += value.to_i.hours
  elsif i == 1
    duration += value.to_i.minutes
  else
    duration += value.to_i.seconds
  end
end

duration.value => 86881 (duration in seconds)

This parser will assume a format of hours:minutes:seconds and return an instance of ActiveSupport::Duration. Then, duration.value will give you the number of seconds.

Erbe answered 2/8, 2017 at 22:6 Comment(3)
"24:08:01" is the value that appears int eh XLS cell but I'm never able to get taht value in my Ruby code. "CSV.parse(text)" is returning stuff like "1904-01-02T00:08:00+00:00" for values greater than 24 hours. I don't want that, I want to get back "24:08:01", so my quesiton is how do I do taht?Archaeozoic
Hmmm, can you try formatting the csv in excel to have plaintext values?Erbe
I'd love to, but the only way I know how is with the code I have listed above -- "book = Roo::Spreadsheet ... csv = CSV.parse(text)". If there's another way to get the XLS file into CSV format and keep the data exactly as I see it in Excel, that would solve my problem.Archaeozoic
O
1

You need to read the internal value of cell instead of formatted value. Formatted value gets written to csv when you use to_csv

To read internal value, you would have to use either sheet objects excelx_value method or row object's cell_value method.

These methods return value in float (days). Here is an example using cell_value by iterating over rows, assuming no header and first column with value to be converted.

Using Roo 2.7.1 (similar methods exist in older version)

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)

formatted_times = []
time_column_index = 0

sheet.each_row_streaming do |row|
  time_in_days = row[time_column_index].cell_value  
  formatted_times << time_as_str(time_in_days.to_f * 24 * 3600) 
end

def time_as_str(t)
  minutes, seconds = t.divmod(60)
  hours, minutes = minutes.divmod(60)
  "%02d:%02d:%02d" % [hours, minutes, seconds]
end

# eg: time_in_days = 1.0169444444444444
# formatted_time = "24:24:24"
Oriental answered 9/8, 2017 at 4:32 Comment(0)
A
0

First, I will try rephrasing what you want to accomplish.

  1. You want to “parse the value you see on the screen”, but I am not sure whether that is 24:08:00 or 1904-01-02T00:08:00+00:00. I assume it is the first.

  2. You want to convert the date value into a time value. I am not sure you actually want the output var to be a Time, a Date, a DateTime, or simply a String. I assume it is ok for you to have it simply as a String, but this is a minor issue.

With this, I assume that what you in general see as HH:MM:SS in Excel, you want to get as “HH:MM:SS” in Rails, regardless of HH being > 23. As an example, 24:08:00 in Excel would turn into “24:08:00” in Rails.

The two seemingly discordant cases you report most likely stem from the two .xls files having different date systems.

To get the desired result you have two options:

  1. Use to_csv, whose result is affected by the date system of the Excel file. In this case, you have to subtract the base_date, as done by Helder Pereira.

  2. Directly get the numeric value from Excel, which is not affected by the date system. In this case, code is simpler, since you only need one conversion (function days2str below).

Code is (modulo minor adjustments)

def days2str(days)
    days_int = int(days)
    hours = ( days - days_int ) * 24
    hours_int = int(hours)
    seconds = ( hours - hours_int ) * 3600
    seconds_int = int(seconds)
    hours_int = hours_int + 24 * days_int
    format("%d:%02d:%02d", hours_int, minutes_int, seconds_int)
end

def is_date(v)
    # Define the checking function
end

require 'spreadsheet'    
Spreadsheet.open('MyTestSheet.xls') do |book|
    book.worksheet('Sheet1').each do |row|
        break if row[0].nil?
        puts row.join(',')
        row.map!{|v| is_date(v) ? days2str(v) : v }
        text_content = "#{text_content}\n#{arr.join("\t")}"
    end
end
Albinus answered 9/8, 2017 at 21:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.