Yahoo finance historical close price to google sheets returns n/a for close later than 100 days
Asked Answered
N

4

1

I try to print historic adjusted close prices from Yahoo finance to Google Sheets.

=ImportXML("https://sg.finance.yahoo.com/quote/"&B57&"/history?p="&B57, "//tbody/tr[21]/td[6]")

Cell B57 is for example "SPY".

This works fine for historic prices up to 100 days. (it is adjusted here: tr[100])

When I try to get prices later 100 days it returns "N/A". These prices are visible on yahoo finance.

It there a way to adjust XPATH that it works?

I noticed, that in the html code of yahoo pices about 100 days don't have this "data-reactid=1520" in the tr tag.

Novikoff answered 6/5, 2020 at 13:29 Comment(0)
Q
-1

Answer:

IMPORTXML can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.

More Information:

As the first 100 values are loaded into the page without the use of JavaScript (as you can see by disabling JavaScript for https://sg.finance.yahoo.com/quote/SPY/history?p=SPY and reloading the page), the information can be retrieved by IMPORTXML.

As the data after the first 100 results is generated on-the-fly after scrolling down the page, the newly available data is not retrievable by IMPORTXML - as far as the formula sees, there is no 101st <tr> element and so it displays N/A: Imported content is empty .

References:


Related Questions:

Querist answered 6/5, 2020 at 14:12 Comment(1)
Thanks for the explanation! Bad news, but now I understand the issue. Do you see a way to create my own database within google sheets, so that it updates the new close price everyday to a list with historic closes. Then I would be able to go back more than 100 days in my own database :-) Thanks in advanceNovikoff
L
1

In the current stage, it seems that your expected values are included in the HTML data as a JSON object for Javascript. In this case, when the JSON object is retrieved with Google Apps Script, the value can be retrieved. When this is reflected in a sample Google Apps Script, how about the following sample script?

Sample script:

Please copy and paste the following script to the script editor of Google Spreadsheet and save the script. When you use this script, please put a custom function of =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY") to a cell. By this, the script is run.

function SAMPLE(url) {
  const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
  if (!html || html.length == 1) return "No data";
  const tempObj = JSON.parse(html[1].trim());
  const obj = tempObj.context.dispatcher.stores;
  const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
  return [header, ...obj.HistoricalPriceStore.prices
    .map(o => header.map(h => {
      if (h == "date") {
        return new Date(o[h] * 1000)
      } else if (h == "amount" && o[h]) {
        return `${o[h]} ${o.type}`;
      }
      return o[h];
    }))];
}

Testing:

When this script is run with =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"), the following result is obtained.

enter image description here

Note:

  • The above script is for a custom function. If you want to use this script with the script editor, you can also the following sample script.

    function myFunction() {
      const url = "https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"; // This URL is from your question.
    
      const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
      if (!html || html.length == 1) return;
      const tempObj = JSON.parse(html[1].trim());
      const obj = tempObj.context.dispatcher.stores;
      const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
      const values = [header, ...obj.HistoricalPriceStore.prices
        .map(o => header.map(h => {
          if (h == "date") {
            return new Date(o[h] * 1000)
          } else if (h == "amount" && o[h]) {
            return `${o[h]} ${o.type}`;
          }
          return o[h];
        }))];
    
      const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1"); // Please set your sheet name.
      sheet.getRange(sheet.getLastRow() + 1, 1, values.length, values[0].length).setValues(values);
    }
    

Note:

  • If const obj = tempObj.context.dispatcher.stores is the salted base64 data, please check this answer.

References:

Levona answered 31/12, 2022 at 12:54 Comment(7)
Do you think that this is a good question to be used as duplicate target for questions about importing data from Yahoo Finance into Google Sheets? Related (Meta question) Canonical question for new questions about importing data from Yahoo Finance into Google SheetsHorseshoe
@Rubén About your comment, I think that it is a difficult question and an important question. In order to retrieve the values from Yahoo Finance, it seems that in the current stage, the API is not prepared. (I understand like this.) So, it seems that the users retrieve the values from the HTML data of the site, and such questions have been posted. But, the specification of the server side is often changed. By this, the method of the accepted answer has not been able to be used. I guess that by this reason, the same questions have been posted.Levona
@Rubén Here, if the specification of the server side is changed, I'm not sure whether the answer is required to be updated by continuing to check the change of specification on the server side. I think that the method of the accepted answer is also useful for the other site and users although that cannot be used after the specification of the server side was changed. So, I think that when the specification of the server side is changed, when a new question is posted and the current answer is posted, it will be useful for users.Levona
@Rubén These are just my comment. If I misunderstood your comment and the current situation, I apologize.Levona
Thank you very much for your reply. You undernstood my comment perfectly. I think that this question is an good example os other several having the same problems 1) OP not followed the How to Ask / Ask questions wizard guidelines 2) X-Y Problem: OP asked how to fix an error instead of asking for help on understanding how to analyse a webpage in order to determine what tool might be used for web-scraping data, in this case from Yahoo Finance, 3) Yahoo Finance, as many modern websites constantly changing the DOM ids / classes names , etc.Horseshoe
I think that this kind of questions should be closed as duplicate of a canonical question to be wrote specifically for websites like Yahoo Finance that include the data as JSON. I will be posting a draft on meta hopefully soon and share the link with you.Horseshoe
@Rubén Thank you for replying. I think that when a value is retrieved from raw JSON data embedded in HTML data from this URL, the questions are duplicated questions. But, recently, it seems that the salted base64 data is used instead of normal base64 data and raw JSON data. In this case, it is required to use a specific decode process. I think that this might be required to be separated from the above questions.Levona
L
0

not possible because yahoo site uses JavaScript element - the infinity scroll - which kicks in after 100th value and that's the reason why you can't get past that point. you can test this by disabling JS for a given site and what's left can be scraped:

0

Lanza answered 6/5, 2020 at 14:37 Comment(0)
L
0

It's possible with a workaround :

YahooFinance

Later than 100 days :

YF2

  • Cell with green background : the code to search
  • Cells with orange backgound : cells containing formulas
  • Cells with yellow background : data returned

Formulas used :

=IMPORTXML(A1;"substring-before(substring-after(//script[@id='fc'],'{""prices"":'),',""isPending')")
=SUBSTITUE(SUBSTITUE(SUBSTITUE(A3;"},{";"|");",";";");".";",")
=REGEXREPLACE(A4;"[a-z:{}\[\]""]+";"")
=TRANSPOSE(SPLIT(A5;"|"))
=(((C8/60)/60)/24)+DATE(1970;1;1)
  • IMPORTXML to import the data.
  • SUBSTITUE AND REGEXREPLACE to prepare the TRANSPOSE step.
  • TRANSPOSE to "build" the lines and SPLIT to "build" the columns.
  • DATE to transform timestamp to date.

Sheet

Liatris answered 6/5, 2020 at 16:52 Comment(2)
Thanks a lot for your solution! I will have a look at it. Currently, I have been working on a script to store the latest value every day into a big list function storeValue() { var ss = SpreadsheetApp.getActiveSpreadsheet(); var sheet = ss.getSheetByName('Sheet1'); // where importXML is var value = sheet.getRange("B1").getValue(); // where the cell of interest is var sheet2 = ss.getSheetByName('Sheet2'); // where to store the data var height = sheet2.getLastRow(); sheet2.insertRowAfter(height); sheet2.getRange(height+1, 1, 1, 2).setValues([[new Date(), value]]); }Novikoff
As of January 4, it looks that this solution it's not working anymore.Horseshoe
Q
-1

Answer:

IMPORTXML can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.

More Information:

As the first 100 values are loaded into the page without the use of JavaScript (as you can see by disabling JavaScript for https://sg.finance.yahoo.com/quote/SPY/history?p=SPY and reloading the page), the information can be retrieved by IMPORTXML.

As the data after the first 100 results is generated on-the-fly after scrolling down the page, the newly available data is not retrievable by IMPORTXML - as far as the formula sees, there is no 101st <tr> element and so it displays N/A: Imported content is empty .

References:


Related Questions:

Querist answered 6/5, 2020 at 14:12 Comment(1)
Thanks for the explanation! Bad news, but now I understand the issue. Do you see a way to create my own database within google sheets, so that it updates the new close price everyday to a list with historic closes. Then I would be able to go back more than 100 days in my own database :-) Thanks in advanceNovikoff

© 2022 - 2024 — McMap. All rights reserved.