using papa parse for big csv files
Asked Answered
A

2

9

I am trying to load a file that has about 100k in lines and so far the browser has been crashing ( locally ). I looked on the internet and saw Papa Parse seems to handle large files. Now it is reduced down to about 3-4 minutes to load into the textarea. Once the file is loaded, I then want to do some more jQuery to do counts and things so the process is taking awhile. Is there a way to make the csv load faster? Am I using the program correctly?

<div id="tabs">
<ul>
  <li><a href="#tabs-4">Generate a Report</a></li>
</ul>
<div id="tabs-4">
  <h2>Generating a CSV report</h2>
  <h4>Input Data:</h4>      
  <input id="myFile" type="file" name="files" value="Load File" />
  <button onclick="loadFileAsText()">Load Selected File</button>
  <form action="./" method="post">
  <textarea id="input3" style="height:150px;"></textarea>

  <input id="run3" type="button" value="Run" />
  <input id="runSplit" type="button" value="Run Split" />
  <input id="downloadLink" type="button" value="Download" />
  </form>
</div>
</div>

$(function () {
    $("#tabs").tabs();
});

var data = $('#input3').val();

function handleFileSelect(evt) {
    var file = evt.target.files[0];

Papa.parse(file, {
    header: true,
    dynamicTyping: true,
    complete: function (results) {
        data = results;
    }
});
}

$(document).ready(function () {

    $('#myFile').change(function(handleFileSelect){

    });
});


function loadFileAsText() {
    var fileToLoad = document.getElementById("myFile").files[0];

    var fileReader = new FileReader();
    fileReader.onload = function (fileLoadedEvent) {
        var textFromFileLoaded = fileLoadedEvent.target.result;
        document.getElementById("input3").value = textFromFileLoaded;
    };
    fileReader.readAsText(fileToLoad, "UTF-8");
}
Arbitrator answered 29/6, 2016 at 12:46 Comment(1)
For faster loading, turn off header and dynamicTyping; and be sure to use streaming! Right now you're just loading all the data in memory so you're lucky it's not crashing.Mohammadmohammed
A
12

You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!

This is probably a good use case scenario for Web Workers.

NOTE: Per @tomBryer's answer below, Papa Parse now has support for Web Workers out of the box. This may be a better approach than rolling your own worker.

If you've never used them before, this site gives a decent rundown, but the key part is:

Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.

Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.

Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.

I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.

EDIT:

(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript function (which is globally defined within the worker thread). See the MDN page for more info on that.)

Here is my working example:

csv.html

<!doctype html>
<html>
<head>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>

<body>
  <input id="myFile" type="file" name="files" value="Load File" />
  <br>
  <button class="load-file">Load and Parse Selected CSV File</button>
  <div id="report"></div>

<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;

// document.ready
$(function() {

  $('.load-file').on('click', function(e) {
    start_time = performance.now();
    $('#report').text('Processing...');

    console.log('initialize worker');

    var worker = new Worker('worker.js');
    worker.addEventListener('message', function(ev) {
      console.log('received raw CSV, now parsing...');

      // Parse our CSV raw text
      Papa.parse(ev.data, {
        header: true,
        dynamicTyping: true,
        complete: function (results) {
            // Save result in a globally accessible var
          parsed_csv = results;
          console.log('parsed CSV!');
          console.log(parsed_csv);

          $('#report').text(parsed_csv.data.length + ' rows processed');
          end_time = performance.now();
          console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
        }
      });

      // Terminate our worker
      worker.terminate();
    }, false);

    // Submit our file to load
    var file_to_load = document.getElementById("myFile").files[0];

    console.log('call our worker');
    worker.postMessage({file: file_to_load});
  });

});
</script>
</body>

</html>

worker.js

self.addEventListener('message', function(e) {
    console.log('worker is running');

    var file = e.data.file;
    var reader = new FileReader();

    reader.onload = function (fileLoadedEvent) {
        console.log('file loaded, posting back from worker');

        var textFromFileLoaded = fileLoadedEvent.target.result;

        // Post our text file back from the worker
        self.postMessage(textFromFileLoaded);
    };

    // Actually load the text file
    reader.readAsText(file, "UTF-8");
}, false);

GIF of it processing, takes less than a second (all running locally)

GIF of working example

Aimo answered 29/6, 2016 at 13:13 Comment(4)
great i look forward to seeing itArbitrator
@Keith, I posted an example. I can load and process the file (100,000 rows, 5 columns, ~2.4 MB file) in under a second. Anything else you may want to do via looping through the data can also be used through a worker if you feel it is still taking a while. This was run in Chrome Canary v53.Aimo
Awesome answer @Aimo thanks for putting the work into this one!Crosspatch
This exact example helped me when the worker that comes with Papa Parse was giving me a "window is not defined" error and it was more trouble than it was worth to resolve. So hence writing a custom worker, and I used this exact one as a guide - thanks! However I had to make modifications because I was using Papa Parse and the web worker example above in React, and was getting all sorts of errors. If you're working in React using this answer to write a web worker, I'd recommend looking at this tutorial as well: fullstackreact.com/articles/…Glaring
E
8

As of v5, PapaParse has now baked in WebWorkers.

A simple example of invoking the worker within Papaparse is below

Papa.parse(bigFile, {
    worker: true,
    step: function(results) {
        console.log("Row:", results.data);
    }
});

No need to re-implement if you have your own worker with PP, but for future projects, some may find it easier to use PapaParse's solution.

Ernaernald answered 1/6, 2019 at 16:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.