How to automatically import data from uploaded CSV or XLS file into Google Sheets
Asked Answered
R

5

40

I have a legacy database system (not web accessible) on a server which generates CSV or XLS reports to a Google Drive folder. Currently, I am manually opening those files in Drive web interface and converting them to Google Sheets.

I would rather this be automatic so that I can create jobs that append/transform and graph the data in other sheets.

Is it possible to output a native .gsheet file? Or is there a way to convert CSV or XLS to .gsheet programmatically after saving it to Google Drive either in Google Apps or via a Windows based script/utility?

Risinger answered 10/11, 2014 at 22:55 Comment(2)
google-spreadsheet-api can import data into an existing google spreadsheet. I suspect Drive API has a way to import as a new spreadsheet file, as I think I saw some code for the import on SO.Leela
Curious as to how your legacy system generates the report files directly to Google Drive. Does it use the Drive API or some other mechanism? If the former, then you can change the code to auto-convert to Sheets on the fly rather than post-processing CSV files.Unmask
R
42

You can programmatically import data from a csv file in your Drive into an existing Google Sheet using Google Apps Script, replacing/appending data as needed.

Below is some sample code. It assumes that: a) you have a designated folder in your Drive where the CSV file is saved/uploaded to; b) the CSV file is named "report.csv" and the data in it comma-delimited; and c) the CSV data is imported into a designated spreadsheet. See comments in code for further details.

function importData() {
  var fSource = DriveApp.getFolderById(reports_folder_id); // reports_folder_id = id of folder where csv reports are saved
  var fi = fSource.getFilesByName('report.csv'); // latest report file
  var ss = SpreadsheetApp.openById(data_sheet_id); // data_sheet_id = id of spreadsheet that holds the data to be updated with new report data

  if ( fi.hasNext() ) { // proceed if "report.csv" file exists in the reports folder
    var file = fi.next();
    var csv = file.getBlob().getDataAsString();
    var csvData = CSVToArray(csv); // see below for CSVToArray function
    var newsheet = ss.insertSheet('NEWDATA'); // create a 'NEWDATA' sheet to store imported data
    // loop through csv data array and insert (append) as rows into 'NEWDATA' sheet
    for ( var i=0, lenCsv=csvData.length; i<lenCsv; i++ ) {
      newsheet.getRange(i+1, 1, 1, csvData[i].length).setValues(new Array(csvData[i]));
    }
    /*
    ** report data is now in 'NEWDATA' sheet in the spreadsheet - process it as needed,
    ** then delete 'NEWDATA' sheet using ss.deleteSheet(newsheet)
    */
    // rename the report.csv file so it is not processed on next scheduled run
    file.setName("report-"+(new Date().toString())+".csv");
  }
};


// http://www.bennadel.com/blog/1504-Ask-Ben-Parsing-CSV-Strings-With-Javascript-Exec-Regular-Expression-Command.htm
// This will parse a delimited string into an array of
// arrays. The default delimiter is the comma, but this
// can be overriden in the second argument.

function CSVToArray( strData, strDelimiter ) {
  // Check to see if the delimiter is defined. If not,
  // then default to COMMA.
  strDelimiter = (strDelimiter || ",");

  // Create a regular expression to parse the CSV values.
  var objPattern = new RegExp(
    (
      // Delimiters.
      "(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +

      // Quoted fields.
      "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +

      // Standard fields.
      "([^\"\\" + strDelimiter + "\\r\\n]*))"
    ),
    "gi"
  );

  // Create an array to hold our data. Give the array
  // a default empty first row.
  var arrData = [[]];

  // Create an array to hold our individual pattern
  // matching groups.
  var arrMatches = null;

  // Keep looping over the regular expression matches
  // until we can no longer find a match.
  while (arrMatches = objPattern.exec( strData )){

    // Get the delimiter that was found.
    var strMatchedDelimiter = arrMatches[ 1 ];

    // Check to see if the given delimiter has a length
    // (is not the start of string) and if it matches
    // field delimiter. If id does not, then we know
    // that this delimiter is a row delimiter.
    if (
      strMatchedDelimiter.length &&
      (strMatchedDelimiter != strDelimiter)
    ){

      // Since we have reached a new row of data,
      // add an empty row to our data array.
      arrData.push( [] );

    }

    // Now that we have our delimiter out of the way,
    // let's check to see which kind of value we
    // captured (quoted or unquoted).
    if (arrMatches[ 2 ]){

      // We found a quoted value. When we capture
      // this value, unescape any double quotes.
      var strMatchedValue = arrMatches[ 2 ].replace(
        new RegExp( "\"\"", "g" ),
        "\""
      );

    } else {

      // We found a non-quoted value.
      var strMatchedValue = arrMatches[ 3 ];

    }

    // Now that we have our value string, let's add
    // it to the data array.
    arrData[ arrData.length - 1 ].push( strMatchedValue );
  }

  // Return the parsed data.
  return( arrData );
};

You can then create time-driven trigger in your script project to run importData() function on a regular basis (e.g. every night at 1AM), so all you have to do is put new report.csv file into the designated Drive folder, and it will be automatically processed on next scheduled run.

If you absolutely MUST work with Excel files instead of CSV, then you can use this code below. For it to work you must enable Drive API in Advanced Google Services in your script and in Developers Console (see How to Enable Advanced Services for details).

/**
 * Convert Excel file to Sheets
 * @param {Blob} excelFile The Excel file blob data; Required
 * @param {String} filename File name on uploading drive; Required
 * @param {Array} arrParents Array of folder ids to put converted file in; Optional, will default to Drive root folder
 * @return {Spreadsheet} Converted Google Spreadsheet instance
 **/
function convertExcel2Sheets(excelFile, filename, arrParents) {

  var parents  = arrParents || []; // check if optional arrParents argument was provided, default to empty array if not
  if ( !parents.isArray ) parents = []; // make sure parents is an array, reset to empty array if not

  // Parameters for Drive API Simple Upload request (see https://developers.google.com/drive/web/manage-uploads#simple)
  var uploadParams = {
    method:'post',
    contentType: 'application/vnd.ms-excel', // works for both .xls and .xlsx files
    contentLength: excelFile.getBytes().length,
    headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
    payload: excelFile.getBytes()
  };

  // Upload file to Drive root folder and convert to Sheets
  var uploadResponse = UrlFetchApp.fetch('https://www.googleapis.com/upload/drive/v2/files/?uploadType=media&convert=true', uploadParams);

  // Parse upload&convert response data (need this to be able to get id of converted sheet)
  var fileDataResponse = JSON.parse(uploadResponse.getContentText());

  // Create payload (body) data for updating converted file's name and parent folder(s)
  var payloadData = {
    title: filename, 
    parents: []
  };
  if ( parents.length ) { // Add provided parent folder(s) id(s) to payloadData, if any
    for ( var i=0; i<parents.length; i++ ) {
      try {
        var folder = DriveApp.getFolderById(parents[i]); // check that this folder id exists in drive and user can write to it
        payloadData.parents.push({id: parents[i]});
      }
      catch(e){} // fail silently if no such folder id exists in Drive
    }
  }
  // Parameters for Drive API File Update request (see https://developers.google.com/drive/v2/reference/files/update)
  var updateParams = {
    method:'put',
    headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
    contentType: 'application/json',
    payload: JSON.stringify(payloadData)
  };

  // Update metadata (filename and parent folder(s)) of converted sheet
  UrlFetchApp.fetch('https://www.googleapis.com/drive/v2/files/'+fileDataResponse.id, updateParams);

  return SpreadsheetApp.openById(fileDataResponse.id);
}

/**
 * Sample use of convertExcel2Sheets() for testing
 **/
 function testConvertExcel2Sheets() {
  var xlsId = "0B9**************OFE"; // ID of Excel file to convert
  var xlsFile = DriveApp.getFileById(xlsId); // File instance of Excel file
  var xlsBlob = xlsFile.getBlob(); // Blob source of Excel file for conversion
  var xlsFilename = xlsFile.getName(); // File name to give to converted file; defaults to same as source file
  var destFolders = []; // array of IDs of Drive folders to put converted file in; empty array = root folder
  var ss = convertExcel2Sheets(xlsBlob, xlsFilename, destFolders);
  Logger.log(ss.getId());
}

The above code is also available as a gist here.

Robena answered 11/11, 2014 at 5:32 Comment(11)
This is perfect, thank you. Since some of the reports contain commas and, sadly, the legacy program is unable to use another delimited, is there any way to import in this way for Excel spreadsheets?Risinger
Actually, I have another follow-up question if you don't mind. For the purposes of small business reporting (not much data), where's the best place to keep the data? In a set of Google Sheets, or does it make sense to use their AppEngine databases?Risinger
If your legacy problem outputs csv properly, enclosing any values with commas in then into quotation marks, then it's fine, the script will work. If you must use Excel files, then the code here for converting Excel file to Google Spreadsheet will do the trick. In my tests it worked with both xls and xlsx files. To use that code you must enable Drive API in Advanced Google Services in your project AND in Developers Console. The developer key the code requires is the Server key you can create in Dev Console.Robena
As for where to store data: spreadsheets are fine if the amount of data is small, and the slow speed of read/write is not an issue. Using CacheService you can speed up things somewhat. For my apps that would have more that a few hundred rows of data, I have switched to using Parse.com after Google deprecated ScriptDb. There's a ParseDb library for GAS (Library key: MxhsVzdWH6ZQMWWeAA9tObPxhMjh3Sh48). And using GAS JDBC Service you can connect to any public-facing db.Robena
I have modified the convert Excel to Sheets sample code I linked to in previous comment to make it simpler (not use oAuth2, as it's unnecessary when you run the script as owner). You can find my updated convertExcel2Sheets function here. I have also added it to my answer.Robena
@Robena is there a way to avoid the manual operation of putting report.csv into the right folder ? I mean making the script load data from an api for exampleEagleeyed
@clemlaflemme It is definitely possible. For instance, you could use UrlFetchApp class to get data from an online source or an web-accessible API. If you data is not web-accessible (for example, I used to work at a school that used SIMS school management system, which only ran on Windows with no web access and you could only get data out of it by running pre-defined reports from within the application or a separate report-generating executable), you can export locally into a Google Drive sync folder to automatically save it to specific location in your Drive. Anything is possible!Robena
thanks, I tried that but I guess I don't know how to authenticate with URLFecthApp. When browsing on Chrome, I have to first log in to access the fileEagleeyed
@clemlaflemme Yes, that could be tricky - all depends on what kind of authentication process is used. It is usually session cookie based nowadays, and that is a pain to reproduce using URLFetchApp. If your data service does not have a proper API, it could be as easy as passing username and password values as a body of POST request, or as hard as multiple requests passing various headers and cookies. Sorry I can't help any more without actual access to the data.Robena
hi, i'm trying to import data from a csv file to a spreadsheet, but the csv file is too big so the script ended up taking more than 5 minutes. Is there a way to do by parts?...Borchers
The code to import CSV to sheets works great! However does it cause any lock on the CSV file? Because the code that creates this CSV file from a local database is giving an error.Amaral
L
8

You can get Google Drive to automatically convert csv files to Google Sheets by appending

?convert=true

to the end of the api url you are calling.

EDIT: Here is the documentation on available parameters: https://developers.google.com/drive/v2/reference/files/insert

Also, while searching for the above link, I found this question has already been answered here:

Upload CSV to Google Drive Spreadsheet using Drive v2 API

Loquitur answered 13/11, 2014 at 14:34 Comment(5)
The reporting program does not allow for HTTP calls. All it can do is generate the CSV or XLS and place it in a folder (which happens to be a google drive folder).Risinger
I wonder... Is it possible to call this from an appscript, so that it will just convert and file I put into a certain folder?Risinger
The ?convert=true parameter only applies to the file uploaded in that same http request. Check the settings of the Google Drive application you use, see if there are settings that handle conversion. Behind the scenes, that program will be using http requests to sync your files.Loquitur
For v3 of the API convert no longer works. Instead specify the MIME type of the upload as text/csv and the MIME type of the file you want as application/vnd.google-apps.spreadsheet.Benfield
@Benfield thank you, it works. You should make it a separated answer for future references.Tabulator
U
8

(Mar 2017) The accepted answer is not the best solution. It relies on manual translation using Apps Script, and the code may not be resilient, requiring maintenance. If your legacy system autogenerates CSV files, it's best they go into another folder for temporary processing (importing [uploading to Google Drive & converting] to Google Sheets files).

My thought is to let the Drive API do all the heavy-lifting. The Google Drive API team released v3 at the end of 2015, and in that release, insert() changed names to create() so as to better reflect the file operation. There's also no more convert flag -- you just specify MIMEtypes... imagine that!

The documentation has also been improved: there's now a special guide devoted to uploads (simple, multipart, and resumable) that comes with sample code in Java, Python, PHP, C#/.NET, Ruby, JavaScript/Node.js, and iOS/Obj-C that imports CSV files into Google Sheets format as desired.

Below is one alternate Python solution for short files ("simple upload") where you don't need the apiclient.http.MediaFileUpload class. This snippet assumes your auth code works where your service endpoint is DRIVE with a minimum auth scope of https://www.googleapis.com/auth/drive.file.

# filenames & MIMEtypes
DST_FILENAME = 'inventory'
SRC_FILENAME = DST_FILENAME + '.csv'
SHT_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
CSV_MIMETYPE = 'text/csv'

# Import CSV file to Google Drive as a Google Sheets file
METADATA = {'name': DST_FILENAME, 'mimeType': SHT_MIMETYPE}
rsp = DRIVE.files().create(body=METADATA, media_body=SRC_FILENAME).execute()
if rsp:
    print('Imported %r to %r (as %s)' % (SRC_FILENAME, DST_FILENAME, rsp['mimeType']))

Better yet, rather than uploading to My Drive, you'd upload to one (or more) specific folder(s), meaning you'd add the parent folder ID(s) to METADATA. (Also see the code sample on this page.) Finally, there's no native .gsheet "file" -- that file just has a link to the online Sheet, so what's above is what you want to do.

If not using Python, you can use the snippet above as pseudocode to port to your system language. Regardless, there's much less code to maintain because there's no CSV parsing. The only thing remaining is to blow away the CSV file temp folder your legacy system wrote to.

Unmask answered 16/3, 2017 at 6:4 Comment(0)
R
2

In case anyone would be searching - I created utility for automated import of xlsx files into google spreadsheet: xls2sheets. One can do it automatically via setting up the cronjob for ./cmd/sheets-refresh, readme describes it all. Hope that would be of use.

Resurrectionism answered 9/6, 2019 at 3:24 Comment(0)
C
0

Though there is already an accepted answer, another option is to use integration tools. There are many these days. You can choose what suits your taste and needs. I’m familiar with Skyvia so it’s the one I will use in the following sample. What’s needed to make this work:

  • An account in Skyvia. It’s free to register and you can start immediately.
  • Create 2 connections: One for the CSV and another in Google Sheets both located in the same Google drive. You will sign-in to your Google account and Google will provide the access token to Skyvia.
  • Create an Import integration using the 2 connections earlier.
  • Schedule the Import once it’s working.

Here’s the information needed for the CSV connection: Google Drive

Just an access token from Google. You need to provide the name of the CSV later in the Import integration. Here’s the connection information to Google Sheets: Google Sheet

Again, it’s the access token from Google and the specific Google Sheets spreadsheet. You will indicate the specific sheet later in the Import integration. Then, create the Import integration and specify the Source and Target, as seen below: Csv to sheet

Then, create a task in the Import integration to specify the CSV file, the specific sheet in the Spreadsheet, and the column mappings. Here’s the Source definition: csv push

I used a CSV file with city information in it as a sample. Here’s a preview of the data: city csv

The target Google Sheets have the same columns. In your case, the number of columns may not be the same but it can be handled in the column mappings later. Here’s the Target Definition: Task Editor

You indicate the specific sheet and the operation to perform here. Then, map the columns. Some columns with the same name and type will map automatically. See below: task editor step 2

Save the Task and the Import integration and click Run in the upper right corner. Then, wait until it’s done. Check the Monitor in Skyvia to see if the import is a success or not. Below is the result in mine: city csv data

Then, after a successful test run, you can create the schedule: schedule

So, it meets:

  • Opening the CSV file
  • Copy the data in Google Sheets
  • Scheduling the Import.

If you have a scenario where you have file naming conventions for CSV and Sheets, you can use the Skyvia Data Flow or Control Flow. You can generate filenames in a variable and set it for your CSV filename. This is applicable too if you need to do something else before you can open the Google Sheet, like deleting any existing rows. I hope this helps for those looking for more options.

Capsicum answered 8/4 at 9:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.