I have the url to a possibly large (100+ Mb) file, how do I save it in a local directory using fetch?
I looked around but there don't seem to be a lot of resources/tutorials on how to do this.
I have the url to a possibly large (100+ Mb) file, how do I save it in a local directory using fetch?
I looked around but there don't seem to be a lot of resources/tutorials on how to do this.
Updated solution on Node 18:
const fs = require("fs");
const { mkdir } = require("fs/promises");
const { Readable } = require('stream');
const { finished } = require('stream/promises');
const path = require("path");
const downloadFile = (async (url, fileName) => {
const res = await fetch(url);
if (!fs.existsSync("downloads")) await mkdir("downloads"); //Optional if you already have downloads directory
const destination = path.resolve("./downloads", fileName);
const fileStream = fs.createWriteStream(destination, { flags: 'wx' });
await finished(Readable.fromWeb(res.body).pipe(fileStream));
});
await downloadFile("<url_to_fetch>", "<fileName>")
Old Answer works till Node 16:
Using the Fetch API you could write a function that could download from a URL like this:
You will need node-fetch@2
run npm i node-fetch@2
const fetch = require("node-fetch");
const fs = require("fs");
const downloadFile = (async (url, path) => {
const res = await fetch(url);
const fileStream = fs.createWriteStream(path);
await new Promise((resolve, reject) => {
res.body.pipe(fileStream);
res.body.on("error", reject);
fileStream.on("finish", resolve);
});
});
res.body.on('error', reject);
and fileStream.on('finish', resolve);
. –
Centrality async function downloadFile
style over const somevar =
–
Zoroastrian async
function always return a Promise
, rather than await
ing one? –
Vespertilionine downloadFile
will still return an empty promise because of the async
keyword, but it won't return until awaiting the inner anonymous promise –
Firmament downloadFile
should return the new Promise
and the outer code should call await downloadFile()
, if I'm understanding the expected behavior of async
functions correctly. –
Vespertilionine downloadFile
to be! What I think you're describing would be effectively the same as the current answer –
Firmament fromWeb
is flagged as experimental –
Discreet Older answers here involve node-fetch
, but since Node.js v18.x
this can be done with no extra dependencies.
The body of a fetch response is a web stream. It can be converted to a Node fs
stream using Readable.fromWeb
, which can then be piped into a write stream created by fs.createWriteStream
. If desired, the resulting stream can then be turned into a Promise
using the promise version of stream.finished
.
const fs = require('fs');
const { Readable } = require('stream');
const { finished } = require('stream/promises');
const stream = fs.createWriteStream('output.txt');
const { body } = await fetch('https://example.com');
await finished(Readable.fromWeb(body).pipe(stream));
const download = async (url, path) => Readable.fromWeb((await fetch(url)).body).pipe(fs.createWriteStream(path))
–
Bromide await fetch(...)
) before starting the write stream? –
Overfeed await fetch(...)
finishes after the response headers are fully received, but before the response body is received. The body will be streamed into the file while it is arriving. The second await
can be omitted to perform other tasks while the body stream is still in progress. –
Polypary Argument of type 'ReadableStream<Uint8Array>' is not assignable to parameter of type 'ReadableStream<any>'. Type 'ReadableStream<Uint8Array>' is missing the following properties from type 'ReadableStream<any>': values, [Symbol.asyncIterator]ts(2345)
–
Dividivi ReadableStream
definitions, as per #63630614. You should be able to cast body
to the correct ReadableStream
from 'stream/web'
; i.e. import { ReadableStream } from 'stream/web';
and body as ReadableStream<any>
. –
Polypary import
as Node support it easily. –
Keffer Readable.fromWeb()
even necessary if body
is already a ReadableStream
? –
Mcloughlin If you want to avoid explicitly making a Promise like in the other very fine answer, and are ok with building a buffer of the entire 100+ MB file, then you could do something simpler:
const fetch = require('node-fetch');
const {writeFile} = require('fs/promises');
function downloadFile(url, outputPath) {
return fetch(url)
.then(x => x.arrayBuffer())
.then(x => writeFile(outputPath, Buffer.from(x)));
}
But the other answer will be more memory-efficient since it's piping the received data stream directly into a file without accumulating all of it in a Buffer.
EISDIR
means "Error: IS Directory": you're giving Node a directory when it expects a file. Just use d:\work\repo\file.txt
for example –
Bradney This is now easy using modern NodeJS APIs. This will not read the entire file into memory at once so can be used with huge files and is great for performance.
import { writeFile } from 'node:fs/promises'
import { Readable } from 'node:stream'
const response = await fetch('https://example.com/pdf')
const body = Readable.fromWeb(response.body)
await writeFile('document.pdf', body)
const {createWriteStream} = require('fs');
const {pipeline} = require('stream/promises');
const fetch = require('node-fetch');
const downloadFile = async (url, path) => pipeline(
(await fetch(url)).body,
createWriteStream(path)
);
TypeError: Cannot read property 'on' of undefined at destroyer (internal/streams/pipeline.js:23:10)
–
Garv import { existsSync } from "fs";
import { mkdir, writeFile } from "fs/promises";
import { join } from "path";
export const download = async (url: string, ...folders: string[]) => {
const fileName = url.split("/").pop();
const path = join("./downloads", ...folders);
if (!existsSync(path)) await mkdir(path);
const filePath = join(path, fileName);
const response = await fetch(url);
const blob = await response.blob();
// const bos = Buffer.from(await blob.arrayBuffer())
const bos = blob.stream();
await writeFile(filePath, bos);
return { path, fileName, filePath };
};
// call like that ↓
await download("file-url", "subfolder-1", "subfolder-2", ...)
I was looking for kinda a same usage, wanted to fetch bunch of api endpoints and save the json responses to some static files, so I came up creating my own solution, hope it helps
const fetch = require('node-fetch'),
fs = require('fs'),
VERSIOINS_FILE_PATH = './static/data/versions.json',
endpoints = [
{
name: 'example1',
type: 'exampleType1',
url: 'https://example.com/api/url/1',
filePath: './static/data/exampleResult1.json',
updateFrequency: 7 // days
},
{
name: 'example2',
type: 'exampleType1',
url: 'https://example.com/api/url/2',
filePath: './static/data/exampleResult2.json',
updateFrequency: 7
},
{
name: 'example3',
type: 'exampleType2',
url: 'https://example.com/api/url/3',
filePath: './static/data/exampleResult3.json',
updateFrequency: 30
},
{
name: 'example4',
type: 'exampleType2',
url: 'https://example.com/api/url/4',
filePath: './static/data/exampleResult4.json',
updateFrequency: 30
},
],
checkOrCreateFolder = () => {
var dir = './static/data/';
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir);
}
},
syncStaticData = () => {
checkOrCreateFolder();
let fetchList = [],
versions = [];
endpoints.forEach(endpoint => {
if (requiresUpdate(endpoint)) {
console.log(`Updating ${endpoint.name} data... : `, endpoint.filePath);
fetchList.push(endpoint)
} else {
console.log(`Using cached ${endpoint.name} data... : `, endpoint.filePath);
let endpointVersion = JSON.parse(fs.readFileSync(endpoint.filePath, 'utf8')).lastUpdate;
versions.push({
name: endpoint.name + "Data",
version: endpointVersion
});
}
})
if (fetchList.length > 0) {
Promise.all(fetchList.map(endpoint => fetch(endpoint.url, { "method": "GET" })))
.then(responses => Promise.all(responses.map(response => response.json())))
.then(results => {
results.forEach((endpointData, index) => {
let endpoint = fetchList[index]
let processedData = processData(endpoint.type, endpointData.data)
let fileData = {
data: processedData,
lastUpdate: Date.now() // unix timestamp
}
versions.push({
name: endpoint.name + "Data",
version: fileData.lastUpdate
})
fs.writeFileSync(endpoint.filePath, JSON.stringify(fileData));
console.log('updated data: ', endpoint.filePath);
})
})
.catch(err => console.log(err));
}
fs.writeFileSync(VERSIOINS_FILE_PATH, JSON.stringify(versions));
console.log('updated versions: ', VERSIOINS_FILE_PATH);
},
recursiveRemoveKey = (object, keyname) => {
object.forEach((item) => {
if (item.items) { //items is the nesting key, if it exists, recurse , change as required
recursiveRemoveKey(item.items, keyname)
}
delete item[keyname];
})
},
processData = (type, data) => {
//any thing you want to do with the data before it is written to the file
let processedData = type === 'vehicle' ? processType1Data(data) : processType2Data(data);
return processedData;
},
processType1Data = data => {
let fetchedData = [...data]
recursiveRemoveKey(fetchedData, 'count')
return fetchedData
},
processType2Data = data => {
let fetchedData = [...data]
recursiveRemoveKey(fetchedData, 'keywords')
return fetchedData
},
requiresUpdate = endpoint => {
if (fs.existsSync(endpoint.filePath)) {
let fileData = JSON.parse(fs.readFileSync(endpoint.filePath));
let lastUpdate = fileData.lastUpdate;
let now = new Date();
let diff = now - lastUpdate;
let diffDays = Math.ceil(diff / (1000 * 60 * 60 * 24));
if (diffDays >= endpoint.updateFrequency) {
return true;
} else {
return false;
}
}
return true
};
syncStaticData();
If you don't need to deal with 301/302 responses (when things have been moved), you can actually just do it in one line with the Node.js native libraries http
and/or https
.
You can run this example oneliner in the node
shell. It just uses https
module to download a GNU zip file of some source code to the directory where you started the node
shell. (You start a node
shell by typing node
at the command line for your OS where Node.js has been installed).
require('https').get("https://codeload.github.com/angstyloop/js-utils/tar.gz/refs/heads/develop", it => it.pipe(require('fs').createWriteStream("develop.tar.gz")));
If you don't need/want HTTPS use this instead:
require('http').get("http://codeload.github.com/angstyloop/js-utils/tar.gz/refs/heads/develop", it => it.pipe(require('fs').createWriteStream("develop.tar.gz")));
This got the job done for me node 18 and presumably 16. Has only fs and node-fetch (probably works with other fetch libraries) as a dependency.
const fs = require('fs');
const fetch = require("node-fetch");
async function downloadImage(imageUrl){
//imageurl https://example.com/uploads/image.jpg
imageUrl = imageUrl.split('/').slice(-1) //image.jpg
const res = await fetch(imageUrl);
const fileStream = fs.createWriteStream(`./folder/${imageUrl}`);
await new Promise((resolve, reject) => {
res.body.pipe(fileStream);
res.body.on("error", reject);
fileStream.on("finish", resolve);
});
};
Previous top answer by @code_wrangler was split into a node 16 and 18 solution (this is like the 16 solution), but on Node 18 the Node 18 solution created a 0 byte file for me and cost me some time.
© 2022 - 2024 — McMap. All rights reserved.