How can I replicate the functionality of a wget with node.js?
Asked Answered
I

6

16

Is it possible to essentially run a wget from within a node.js app? I'd like to have a script that crawls a site, and downloads a specific file, but the href of the link that goes the file changes fairly often. So, I figured the easiest way to go about doing it would be to find the href of the link, then just perform a wget on it.

Thanks!

Immesh answered 2/3, 2012 at 22:21 Comment(2)
See the node.js documentation for child_process.exec(cmd).Vogue
All current answers are too complicated. There is a simple solution with a file stream. #11945432Fob
D
10

You can run an external command using child_processes:

http://nodejs.org/docs/latest/api/child_process.html#child_process_child_process_exec_command_options_callback

var util = require('util'),
    exec = require('child_process').exec,
    child,
    url = 'url to file';

child = exec('wget ' + url,
  function (error, stdout, stderr) {
    console.log('stdout: ' + stdout);
    console.log('stderr: ' + stderr);
    if (error !== null) {
      console.log('exec error: ' + error);
    }
});
Duke answered 2/3, 2012 at 22:27 Comment(1)
This answers the question, but why would you do that when you can use request?Net
D
21

For future reference though, I would recommend request, which makes it this easy to fetch that file:

var request = require("request");

request(url, function(err, res, body) {
  // Do funky stuff with body
});
Dulcine answered 3/3, 2012 at 1:9 Comment(0)
H
16

While it might be a little more verbose than some third-party stuff, Node's core HTTP module provides for an HTTP client you could use for this:

var http = require('http');
var options = {
    host: 'www.site2scrape.com',
    port: 80,
    path: '/page/scrape_me.html'
  };
var req = http.get(options, function(response) {
  // handle the response
  var res_data = '';
  response.on('data', function(chunk) {
    res_data += chunk;
  });
  response.on('end', function() {
    console.log(res_data);
  });
});
req.on('error', function(err) {
  console.log("Request error: " + err.message);
});
Heffernan answered 3/3, 2012 at 5:50 Comment(2)
I like that this answer utilizes only the core Node library. Good workNodab
If you're looking for even less work, without adding a dependency, using the built-in url module's parse method will yield an object that you can use instead of building options. (Assuming you have a string URI already to pass to it).Platinotype
D
10

You can run an external command using child_processes:

http://nodejs.org/docs/latest/api/child_process.html#child_process_child_process_exec_command_options_callback

var util = require('util'),
    exec = require('child_process').exec,
    child,
    url = 'url to file';

child = exec('wget ' + url,
  function (error, stdout, stderr) {
    console.log('stdout: ' + stdout);
    console.log('stderr: ' + stderr);
    if (error !== null) {
      console.log('exec error: ' + error);
    }
});
Duke answered 2/3, 2012 at 22:27 Comment(1)
This answers the question, but why would you do that when you can use request?Net
M
2

You can use node-wget. Works in cases where 'wget' is not possible

Merited answered 8/2, 2013 at 9:38 Comment(0)
D
1

U can just use wget.

var exec = require('child_process').exec;

child = exec("/path/to/wget http://some.domain/some.file", function (error, stdout, stderr) {
if (error !== null) {
  console.log("ERROR: " + error);
}
else {
  console.log("YEAH IT WORKED");
}
});
Diplosis answered 2/3, 2012 at 22:27 Comment(0)
V
0

You can use HTTPS client and FileSystem from Node.js.

Here an example with an async function. This function also handle redirect which wget does for you.

const http = require("https");
const fs = require("fs");

/**
 * @param {string} url
 * @param {string} dest
 * @returns {Promise<void>}
 */
function wget(url, dest) {
  return new Promise((res) => {
    http.get(url, (response) => {
      if (response.statusCode == 302) {
        // if the response is a redirection, we call again the method with the new location
        wget(String(response.headers.location), dest);
      } else {
        const file = fs.createWriteStream(dest);

        response.pipe(file);
        file.on("finish", function () {
          file.close();
          res();
        });
      }
    });
  });
}

Please note that you need to use http or https module according to your URL

Vegetative answered 7/11, 2022 at 9:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.