Get just header from remote csv file using papa parse
Asked Answered
I

2

10

I need to extract just the header from a remote csv file.

My current method is as follows:

Papa parse has a method to stream data and look at each row individually which is great, and I can terminate the stream using parser.abort() to prevent it going any further after the first row, this looks as follows:

Papa.parse(csv_file_and_path,{header:true, worker:true, 
    download: true,
    step: function(row, parser) 
    {
        //DO MY STUFF HERE
        parser.abort();
    }
});

This works fine, but because I am using a remote file, it has to download the data in order to read it. Even though the code releases control back to the browser after the first line has been parsed, the download continues long after the parsing has found the first row and given me the information I need, particularly for large files where the download can continue for a long time after I've got what I need.

Is there a more efficient way of doing this? Can I prevent papa parse from downloading the whole file?

I have tried using

Papa.parse(csv_file,{header:true,
download: true,
preview:1,
complete: function(results){
    //DO MY STUFF HERE
}
});

But this does the same thing, it downloads the entire file, but as with the first approach gives back control to the browser after the first line is parsed.

Instillation answered 17/8, 2016 at 15:46 Comment(0)
I
4

The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.

Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.

Papa.parse(csv_file,{header:true,
    download: true,
    step: function(results, parser) {

        //DO MY THING HERE

        parser.abort(); 
        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }, complete: function(results){

        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }
});
Instillation answered 29/3, 2017 at 7:55 Comment(0)
A
4

You can use the preview option of PapaParse:

 Papa.parse(..., {
          preview: 5, ...

Also read this: https://github.com/mholt/PapaParse/issues/47

Related topic: Javascript using File.Reader() to read line by line

Alcoholometer answered 23/3, 2017 at 13:38 Comment(6)
The preview method doesn't work, I should have mentioned this before as it had already been tested, I'll update my question. Preview doesn't seem to stop it downloading the entire file, which it should, but I've tested it and it doesn't.Instillation
It works for me, maybe we use a different version, try with latest one. It clearly freezes the browser on large CSV files for me, but not with preview. Also note that I used Firefox to test this.Alcoholometer
Yeah I tested it myself today, I find it works in the sense that it frees the browser up as my original method does, but it doesn't seem to stop the download, which continues in the background afterwards. Have you monitored your network usage to see the size of the file retrieved? In mine it downloads the entire file still in the background.Instillation
I did not check the network, but I will certainly have a look now that you mention it.Alcoholometer
Ok great, let me know what you find.Instillation
@SingleEntity I actually used local file select, which is another use case.Alcoholometer
I
4

The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.

Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.

Papa.parse(csv_file,{header:true,
    download: true,
    step: function(results, parser) {

        //DO MY THING HERE

        parser.abort(); 
        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }, complete: function(results){

        results=null;   //Attempting to clear the results from memory
        delete results; //Attempting to clear the results from memory

    }
});
Instillation answered 29/3, 2017 at 7:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.