cfSpreadsheet 2-digit years
Asked Answered
G

1

6

In ColdFusion 11, I'm using cfSpreadsheet to convert a .xls file into a query object. Here is a screenshot of my demo spreadsheet:

screenshot of spreadsheet

I use this code to see the query object as soon as it is created:

<cfspreadsheet action="read" src="demo_spreadsheet.xls" 
       excludeHeaderRow="true" 
       headerrow="1" query="demo_query"/>
<cfdump var="#demo_query#">

... and I get these results:

png showing dump of query object

Notice that all of my 4-digit years from the spreadsheet are now 2-digit years? When I go to output the data in the query object using this code:

<ul>
<cfoutput query="demo_query">
    <li>#name# - #dateformat(start_date, 'medium')#</li>
</cfoutput>
</ul>

... I get the following output (ok, I'm new here, so I can't post more than two screen shots so you'll have to trust me on this copy/paste):

  • Alpha - Jan 1, 2007
  • Bravo - Feb 2, 1972
  • Charlie - Mar 3, 2017
  • Delta - Apr 4, 1984
  • Echo - Dec 31, 2029
  • Foxtrot - Jan 1, 1930
  • Golf - Jan 1, 1930

The year 1907 is now 2007, 1917 is now 2017, 1929 is now 2029, and 2030 is now 1930. It appears that the year of any date before Jan 1, 1930 is read as 20xx, and after Dec 31, 2029 is read as 19xx.

Am I missing something? I thought we figured this kind of thing out with Y2K. Is there a simple setting somewhere that I have incorrect? I've Googled the heck out of this issue and I can't find anything about it.

Any advice would be most welcome.

Getaway answered 19/1, 2016 at 16:30 Comment(3)
On version 9, I got different results. For strings in m/d/yyyy format, nothing was changed. Strings in yyyy-mm-dd format were changed as per your question. It might have to do with the regional settings on the machine running ColdFusion, and/or the machine that produced the spreadsheet.Calabro
In the source spreadsheet, are the dates actually entered with four digits or are the cells just formatting the data?Hertha
Miguel-F - Yes, I typed all four digits of each year. The default regional settings appear to be the culprit.Getaway
K
5

Most likely your spreadsheet cells are using the built in regional format *m/d/yy, meaning the displayed value (or in this case "read") may vary depending on the environment or client used.

Date and time formats that begin with an asterisk (*) respond to changes in regional date and time settings that are specified in Control Panel. Formats without an asterisk are not affected by Control Panel settings.

That seems to be what is happening with cfspreadsheet. Not sure why Excel displays a four digit year, rather than two digit, for format *m/d/yy. However, CF/POI are returning the correct result according to the Excel specifications. Notice if you switch the cell format to the non-regional, four digit year ie m/d/yyyy the output is what you expected:

Comparison of regional and non-regional formats Non-regional Date Format

Update: As to why your CF code displays different years than you expected, it is due to how ambiguous date strings are handled by CF. It is important to note, the query returned by CFSpreadsheet contains strings, not date objects. When you pass those strings into DateFormat, CF must first interpret the strings and convert them into date objects before it can apply the date mask. According to CF's rules, two digit years are interpreted as follows:

A string containing a date/time value formatted according to U.S. locale conventions. Can represent a date/time in the range 100 AD–9999 AD. Years 0-29 are interpreted as 2000-2029; years 30-99 are interpreted as 1930-1999.

Honestly, CFSpreadsheet is designed to provide a simple way to read and write spreadsheets without a lot of bells and whistles. AFAIK, it does not support changing the way the cell values are interpreted. If you want to force a four digit year, you must change the spreadsheet to use a non-regional date format, either manually or programmatically (ie read the spreadsheet with CF, and apply a new cell format). That is probably the simplest option.

If you want more flexibility in terms of code, you can also use spreadsheet functions instead of cfspreadsheet. Though in this specific case, I think they too lack the necessary features. So you might look into using the underlying POI library and a bit of java code. This thread demonstrates how to obtain all kinds of details about the spreadsheet cells and values. It could easily be modified to build your own query or an array of structures containing the values, formats, etcetera:

Array of structures containing cell data

Code:

<cfscript>
// get the sheet you want to read
cfSheet = SpreadSheetRead("c:/temp/demo_spreadsheet.xls"); 
workbook = cfSheet.getWorkBook();
sheetIndex = workbook.getActiveSheetIndex();
sheet = workbook.getSheetAt( sheetIndex );

// utility used to distinguish between dates and numbers
dateUtil = createObject("java", "org.apache.poi.ss.usermodel.DateUtil");

// process the rows and columns
rows = sheet.rowIterator();
while (rows.hasNext()) {
    currentRow = rows.next();
    data = {}; 

    cells = currentRow.cellIterator();
    while (cells.hasNext()) { 
        currentCell = cells.next();

        col = {};
        col.value  = "";
        col.type   = "";
        col.column = currentCell.getColumnIndex()+ 1;
        col.row    = currentCell.getRowIndex()+ 1;
        col.format = currentCell.getCellStyle().getDataFormatString();



        if (currentCell.getCellType() EQ currentCell.CELL_TYPE_STRING) {
               col.value = currentCell.getRichStringCellValue().getString();
            col.type = "string";
        }
        else if (currentCell.getCellType() EQ currentCell.CELL_TYPE_NUMERIC) {
            if (DateUtil.isCellDateFormatted(currentCell)) {
                 col.value = currentCell.getDateCellValue();
                 col.type = "date";
            } 
            else {
                 col.value = currentCell.getNumericCellValue();
                 col.type = "number";
            }
        }
        else if (currentCell.getCellType() EQ currentCell.CELL_TYPE_BOOLEAN) {
            col.value = currentCell.getBooleanCellValue();
            col.type = "boolean";
        }
        // ... handle other types CELL_TYPE_BLANK, CELL_TYPE_ERROR, CELL_TYPE_FORMULA

        data["COL"& col.column] = col;
    }

    // this row is finished. display all values
    WriteDump(data);
}
</cfscript>
Keratoid answered 19/1, 2016 at 18:54 Comment(8)
Thanks for the info. I had not set any cell formatting on my spreadsheet before uploading, and I typed all four digits in when I entered the date. It appears that the 2-digit year is the default format in Excel. When I changed the cell format in the spreadsheet as you described, I did get the expected result in the query object. It is so odd that Excel displays all four digits to the user while only passing two. Is that an Excel bug? Since I can not reasonably expect my users to manually change the format of their dates columns, I'm going to have to develop the work-around you suggest.Getaway
Yep, that is the default, probably for backward compatibility. A bug you say? I am sure MS would call it a post 2K "feature" ;-)Keratoid
Excel displays all four digits to the user while only passing two Hmm... actually if you mean why is your cfoutput different, then the reason is different than I explained above. cfspreadsheet returns strings, not date objects. When you pass those strings into date_format() CF must first convert them into date objects, according to its own rules. So CF is what is causing 1/1/30 to be interpreted as 1930 instead of 2030, not Excel. Updating answer.Keratoid
I've got a scripted solution like the one Leigh described in place now. I do find it odd that I can save the .xls file, close it, and open it later and still see my 4-digit years. Those values must be in the spreadsheet someplace, right? It seems to me like the cfspreadsheet tag ought to be able to get them out. I mean if the script can access all four digits, then they ought to be able to design the tag to get them as well.Getaway
Oh, and I submitted this to Adobe Bugbase. bugbase.adobe.com/index.cfm?event=bug&id=4108544Getaway
After rereading your last comment, I think I misunderstood you. I do not think this is a bug, just the usual issues when working with ambiguous date strings. See my updated answer.Keratoid
Eh. It might not be a bug, but it is certainly something that should be changed/fixed. If there is a four digit year that can be extracted, then it should be.Getaway
Well, honestly I doubt they will change it. CFSpreadsheet/POI simply report the displayed value in the spreadsheet according to the Excel specifications. Since those specs say it is a two digit year, the POI result is correct IMO, and MS Excel is wrong. That said, I agree using a two digit year as the default is .. silly. As if date strings are not ambiguous enough already.Keratoid

© 2022 - 2024 — McMap. All rights reserved.