I have a web page made by html+javascript which is demo, I want to know how to read a local csv file and read line by line so that I can extract data from the csv file.
Without jQuery:
const $output = document.getElementById('output')
document.getElementById('file').onchange = function() {
var file = this.files[0];
var reader = new FileReader();
reader.onload = function(progressEvent) {
// Entire file
const text = this.result;
$output.innerText = text
// By lines
var lines = text.split('\n');
for (var line = 0; line < lines.length; line++) {
console.log(lines[line]);
}
};
reader.readAsText(file);
};
<input type="file" name="file" id="file">
<div id='output'>
...
</div>
Remember to put your javascript code after the file field is rendered.
\n
. But with 100m lines, you're gonna run into table with displaying them in HTML. –
Lanate Using ES6 the javascript becomes a little cleaner
handleFiles(input) {
const file = input.target.files[0];
const reader = new FileReader();
reader.onload = (event) => {
const file = event.target.result;
const allLines = file.split(/\r\n|\n/);
// Reading line by line
allLines.forEach((line) => {
console.log(line);
});
};
reader.onerror = (event) => {
alert(event.target.error.name);
};
reader.readAsText(file);
}
\r?\n
–
Elayneelazaro const allLines = file.split(/\r\n|\n/);
- This is not really "read line by line". This is gulping the whole multi-gig file and choking on it. –
Clansman Here's a function from the MDN docs that shows you how to use a ReadableStream to read a File line-by-line. This example uses fetch
, but if you already have a File, you can call stream()
and getReader()
instead.
async function* makeTextFileLineIterator(fileURL) {
const utf8Decoder = new TextDecoder("utf-8");
let response = await fetch(fileURL);
let reader = response.body.getReader();
let { value: chunk, done: readerDone } = await reader.read();
chunk = chunk ? utf8Decoder.decode(chunk, { stream: true }) : "";
let re = /\r\n|\n|\r/gm;
let startIndex = 0;
for (;;) {
let result = re.exec(chunk);
if (!result) {
if (readerDone) {
break;
}
let remainder = chunk.substr(startIndex);
({ value: chunk, done: readerDone } = await reader.read());
chunk =
remainder + (chunk ? utf8Decoder.decode(chunk, { stream: true }) : "");
startIndex = re.lastIndex = 0;
continue;
}
yield chunk.substring(startIndex, result.index);
startIndex = re.lastIndex;
}
if (startIndex < chunk.length) {
// last line didn't end in a newline char
yield chunk.substr(startIndex);
}
}
for await (let line of makeTextFileLineIterator(urlOfFile)) {
processLine(line);
}
You can reference the following code to read the first lines of a file. But note some caveats and observations:
Why search for the position of the line break? You might want to directly read 512KB(or any other chunk size) as text. But note that unless you read the entire file all at once, you risk breaking a Unicode character at the 512KB boundary. The last several bytes in the chunk might be an incomplete Unicode. When you are slicing the Blob(File) object, you are slicing a byte array instead of a character array. However, if we locate the position of the line break and read up to that location, we know everything that come before it are whole Unicode characters.
Does this guarantee not reading a whole file? I do not know, but at least from the consumption point of view, I am not touching anything after the chunks I read. If the underlying mechanism on the browser side wants to mobilize the entire file, this is none of my concern and will be the best I can do.
Example code:
/*
This function is used to scan the first few lines of a file to determine the position of the nth line break.
This is useful for large files where we want to avoid reading the entire file into memory.
Read is done in chunks of 512KB.
*/
async function scanLinePosition(file: File, lines: number): Promise<number> {
return await new Promise((resolve, reject) => {
const reader = new FileReader();
let rowsRead = 0;
let chunkSize = 512 * 1024; // 512KB
let totalRead = 0;
reader.onload = () => {
const bytes = new Uint8Array(reader.result as ArrayBuffer);
for (let i = 0; i < bytes.length; i++) {
if (bytes[i] === 10) {
rowsRead++;
}
if (rowsRead >= lines) {
break;
}
}
totalRead += bytes.length;
if (rowsRead >= lines) {
resolve(totalRead);
return;
}
if (bytes.length === chunkSize && rowsRead < lines) {
reader.readAsArrayBuffer(file.slice(totalRead, totalRead + chunkSize));
} else {
resolve(totalRead);
}
};
reader.onerror = (error) => {
reject(error);
};
reader.readAsArrayBuffer(file.slice(0, chunkSize));
});
}
async function readFileContent(file: File, lines: number) {
const readLimit = await scanLinePosition(file, lines);
return await new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
const rows = (reader.result as string).split("\n");
if (rows.length >= lines) {
resolve(rows.slice(0, lines));
} else {
reject(new Error("File is too short"));
}
};
reader.onerror = (error) => {
reject(error);
};
reader.readAsText(file.slice(0, readLimit));
});
}
© 2022 - 2024 — McMap. All rights reserved.