How to pull HTML table data (Yahoo Finance) with Cheerio in Google Apps Script? [duplicate]
Asked Answered
A

1

1

I'm trying to get an entire table data from https://finance.yahoo.com/quote/CL%3DF/history?p=CL%3DF. On a browser, the webpage shows 1 year data down to Oct 12, 2020 as a default. But the following code didn't pull the whole table data for some reason. It pulled only partial data, just less than 5 month data only down to May 20, 2021. What am I missing? Can anyone help fix anything wrong in the code? Thank you!

function test() {
  const url = 'https://finance.yahoo.com/quote/CL%3DF/history?p=CL%3DF';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);
  // The URL webpage shows one year data down to Oct 12, 2021 on the browser.
  // But the code below got data only down to May 20, 2020.  Why am I mssing?
  var data = $('table').find('td').toArray().map(x => $(x).text());
  console.log(data[data.length-8]);     // Print the last row date other than the web note
}
Antiparticle answered 11/10, 2021 at 18:10 Comment(0)
H
0

When I saw the HTML data, it seems that the table tab has not all data. But fortunately, I noticed that the object in the Javascript has all data you expect. So how about the following modified script?

Modified script:

In this modified script, the container-bound script of Spreadsheet is used. Of course, you can use the standalone type. But in that case, please modify SpreadsheetApp.getActiveSpreadsheet().

When you use this script, please copy and paste the following modified script to the script editor of Spreadsheet and set the sheet name, and run. By this, all data is retrieved and put to the Spreadsheet.

function test() {
  const url = 'https://finance.yahoo.com/quote/CL%3DF/history?p=CL%3DF';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);

  // I modified below script
  const data = $('script').toArray().reduce((ar, x) => {
    const c = $(x).get()[0].children;
    if (c.length > 0) {
      const d = c[0].data.trim().match(/({"context"[\s\S\w]+);\n}\(this\)\);/);
      if (d && d.length == 2) {
        ar.push(JSON.parse(d[1]));
      }
    }
    return ar;
  }, []);
  if (data.length == 0) throw new Error("No data.");
  const header = ["date","open","high","low","close","adjclose","volume"];
  const ar = data[0].context.dispatcher.stores.HistoricalPriceStore.prices.map(o => header.map(h => h == "date" ? new Date(o[h] * 1000) : (o[h] || "")));
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1"); // <--- Please set the sheet name you want to put the values.
  sheet.getRange(1, 1, ar.length, ar[0].length).setValues(ar);
}

Result:

When the above script is run, the following result is obtained.

enter image description here

References:

Heavyset answered 12/10, 2021 at 0:57 Comment(3)
Hi @Tanaike, your codes doesn't work any more since this week. It seems that there are some changes in Yahoo Finance website. If you wouldn't mind, could you take a look at this issue? This time I will try to better understand how you codes work. Thank you.Antiparticle
@Antiparticle About your new question of Hi @Tanaike, your codes doesn't work any more since this week. It seems that there are some changes in Yahoo Finance website. If you wouldn't mind, could you take a look at this issue? This time I will try to better understand how you codes work. Thank you., in this case, can you post it as a new question? If you can cooperate to resolve your new question, I'm glad. Can you cooperate to do it?Heavyset
I posted a new question. Thank you so much for your help always.Antiparticle

© 2022 - 2024 — McMap. All rights reserved.