Returning HTML body from Nightmare.js
Asked Answered
P

1

7

I'm currently working on some scraping with cheerio and nightmare. The reason why I'm using both and not just cheerio is because I have to manipulate the site to get to the part that I want to scrape and I found nightmare very good at doing those scripts.

So, right now I'm using nightmare to get until the part that the info that I need is displayed. After that, on the evaluate() I'm trying to somehow return the current html to then pass it to cheerio to do the scrape. The problem is that I don't know how to retrieve the html from the document object. Is there is a property from the document thats returns the full body?

Here is what I'm trying to do:

var Nightmare = require('nightmare');
var nightmare = Nightmare({show:true})
var express = require('express');
var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');
var app     = express();

var urlWeb = "url";
var selectCity = "#ddl_city"

nightmare
.goto(urlWeb)
.wait(selectCity)
.select('#ddl_city', '19')
.wait(6000)
.select('#ddl_theater', '12')
.wait(1000)
.click('#btn_enter')
.wait('#aspnetForm')
.evaluate(function(){

    //here is where I want to return the html body
    return document.html;


})
.then(function(body){
//loading html body to cheerio
    var $ = cheerio.load(body);
    console.log(body);
})
Paske answered 25/9, 2016 at 20:30 Comment(5)
Do you require all the html or is document.body sufficient?Gourmand
So far I just need the body @R.A.LucasPaske
Does returning document.body from the evaluate method work?Gourmand
You probably want document.body.outerHTMLCerellia
Its returning null @ArtjomB.Paske
P
10

With this worked:

document.body.innerHTML
Paske answered 25/9, 2016 at 22:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.