Is there an API for the Google Answer Boxes?
Asked Answered
G

5

26

The Google Answer Boxes (sometimes called Featured Snippets, Knowledge Cards, or Live Results) are extremely helpful. I'd like to extract the information and use it in my own program. Looking at the HTML code, it's not quite so straight forward as pulling it from there. I've done quite a bit of research, but I can't seem to find any support for them. Does anyone know if there is an API (or part of the Web Search API) where you can retrieve the information returned from the Answer Box?

I saw the answer here: google api for glorious info box? , but the solution presented was deprecated last month.

enter image description here

Just for an example, this is the HTML code for "What is the time in japan":

<!--m--><div data-hveid="30">      
<div class="vk_c vk_gy vk_sh card-section _MZc">  
<div class="vk_bk vk_ans">6:37 AM</div> 
<div class="vk_gy vk_sh"> Tuesday, <span class="_Hq">August 4, 2015</span>  
<span class="_Hq"> (GMT+9) </span>  
</div> <span class="vk_gy vk_sh">  Time in Japan  </span> 

Which is VERY different from "where is tokyo located":

<!--m-->
<div class="_uX kno-fb-ctx" aria-level="3" role="heading" data-hveid="41" data-ved="0CCkQtwcoATACahUKEwiLjemg8I3HAhUTKYgKHU7jCho">
<div class="_eF" data-tts="answers" data-tts-text="Japan">Japan</div>
<div class="_Tfc">
</div></div>
<!--n-->
</li><li class="mod" data-md="61" style="clear:none">
<!--m-->
<div class="_oDd" data-hveid="42">
<span class="_Tgc _y9e">Tokyo consists of the southwestern part of the Kanto region, the <b>Izu Islands</b>, and the <b>Ogasawara Islands</b>. Tokyo is the capital of <b>Japan</b>, and the place where over 13 million people live, making it one of the most populous cities in the world.</span></div>

I essentially need to extract "6:37 AM" from the first and "Japan" from the second, but performing a dynamic string search would be difficult as they are in very different formats.

Gliadin answered 3/8, 2015 at 22:53 Comment(1)
I'm in the same curious boat as you but now I'm exploring DuckDuckGo possibilities since they have a similar feature: duckduckgo.com/apiLaural
F
9

There is an instant answer api available from DuckDuckGo that I've used in the past that works pretty well. The responses aren't as robust as google's but it's a good start.

https://duckduckgo.com/api

The api looks like so in a JSON response.

{
Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""
}

I hope this helps!

Frication answered 3/6, 2016 at 13:57 Comment(0)
D
3

A bit late, but here is a working solution in 2017 that uses Python and Selenium (with the headless chromedriver) to extract the "primary" text from the answer box, based on the fact that the formatting of the search page and answer box is reasonably consistent across different types of queries (though I haven't tested this exhaustively). Of course, the element coordinates may change depending on resolution/window size, but adjusting for that is easy enough.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--window-size=1024x768")
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)

def ask_google(query):

    # Search for query
    query = query.replace(' ', '+')

    driver.get('http://www.google.com/search?q=' + query)

    # Get text from Google answer box

    answer = driver.execute_script(
            "return document.elementFromPoint(arguments[0], arguments[1]);",
            350, 230).text

    return answer

And testing this approach with your queries (or close to them) produces:

ask_google("what is the time in Japan")

"4:36 PM"

ask_google("where is tokyo located in japan")

"Situated on the Kanto Plain, Tokyo is one of three large cities, the other two being Yokohama and Kawasaki, located along the northwestern shore of Tokyo Bay, an inlet of the Pacific Ocean on east-central Honshu, the largest of the islands of Japan."
Decorum answered 14/11, 2017 at 7:40 Comment(3)
This won't work if you have a newline after the return in your script string.Homophonous
Running this in 2021 gives me an error: 'WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see sites.google.com/a/chromium.org/chromedriver/home'. Does anyone understand this?Evelinevelina
If you're on a Mac, you can run brew install --cask chromedriver to resolve the chromedriver PATH issue.Tarkany
I
2

I've done a lot of research and it seems like there isn't anything currently available like you've described. There isn't anything that could pull information from Google Searches either.

The only thing I could think of that could be an alternative is getting information via RSS (http://www.w3schools.com/xml/xml_rss.asp) and implementing that in a program somehow.

Inhibitory answered 4/2, 2016 at 19:38 Comment(0)
F
2

SerpApi supports direct answer box. It seems to support time as well:

$ curl https://serpapi.com/search.json?q=time+in+japan

...
"answer_box": {
  "type": "local_time",
  "result": "4:37 AM"
},
....

Some documentation: https://serpapi.com/direct-answer-box-api

Fogbow answered 15/2, 2019 at 19:39 Comment(1)
50$ for a month is too much.Ptosis
W
2

I have created a function which scrapes google client side to get the answers from the quick answer box in google. Obviously it's not perfect, but it works pretty well!

async function answer(q) {
  var html = await fetch(
    `https://cors.explosionscratc.repl.co/google.com/search?q=${encodeURI(q)}`,
    {
      headers: {
        "User-Agent":
          "Mozilla/5.0 (X11; CrOS x86_64 13982.88.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.162 Safari/537.36",
      },
    }
  ).then((res) => res.text());
  window.d = new DOMParser().parseFromString(html, "text/html");
  var el =
    d.querySelector("[id*='lrtl-translation-text']") ||
    [...d.querySelectorAll(".kp-header [data-md]")][1] ||
    //Calculator results
    [...document.querySelectorAll(".kCrYT")]?.[1] ||
    [...d.querySelectorAll("*")]
      .filter((i) => i.innerText)
      .filter((i) => i.innerText.includes("Calculator Result"))
      .slice(-2)?.[0]
      ?.innerText?.split("\n")?.[2] ||
    //Snippets
    [...d.querySelectorAll("*")]
      .filter((i) => i.innerText)
      .filter(
        (i) =>
          i.innerText.includes("Featured snippet from the web") ||
          i.innerText.includes("Description") ||
          i.innerText.includes("Calculator result")
      )
      .slice(-1)?.[0]
      ?.parentElement.querySelector("div span") ||
    //Cards (like at the side)
    d.querySelector(
      ".card-section, [class*='__wholepage-card'] [class*='desc']"
    ) ||
    d.querySelector(".thODed")?.querySelector("div span") ||
    [...d.querySelectorAll("[data-async-token]")]?.slice(-1)?.[0] ||
    d.querySelector("miniapps-card-header")?.parentElement ||
    d.querySelector("#tw-target");
  var text = el?.innerText?.trim();
  if (text.includes("translation") && text.includes("Google Translate")) {
    text = text.split("Verified")[0].trim();
  }
  if (
    text.includes("Calculator Result") &&
    text.includes("Your calculations and results")
  ) {
    text = text
      .split("them")?.[1]
      .split("(function()")?.[0]
      ?.split("=")?.[1]
      ?.trim();
  }
  return text;
}

This scrapes the google search page, then parses HTML for answers:

await answer("When were antibiotics discovered");
// "But it was not until 1928 that penicillin, the first true antibiotic, was discovered by Alexander Fleming, Professor of Bacteriology at St. Mary's Hospital in London."

await answer("What time is it in London");
// "4:44 PM"

await answer("define awesome");
//"extremely impressive or daunting; inspiring great admiration, apprehension, or fear."

document.querySelector("button").onclick = () => {  answer(document.querySelector("input").value).then(console.log);
}

async function answer(q) {
  var html = await fetch(
    `https://cors.explosionscratc.repl.co/google.com/search?q=${encodeURI(q)}`,
    {
      headers: {
        "User-Agent":
          "Mozilla/5.0 (X11; CrOS x86_64 13982.88.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.162 Safari/537.36",
      },
    }
  ).then((res) => res.text());
  window.d = new DOMParser().parseFromString(html, "text/html");
  var el =
    d.querySelector("[id*='lrtl-translation-text']") ||
    [...d.querySelectorAll(".kp-header [data-md]")][1] ||
    //Calculator results
    [...document.querySelectorAll(".kCrYT")]?.[1] ||
    [...d.querySelectorAll("*")]
      .filter((i) => i.innerText)
      .filter((i) => i.innerText.includes("Calculator Result"))
      .slice(-2)?.[0]
      ?.innerText?.split("\n")?.[2] ||
    //Snippets
    [...d.querySelectorAll("*")]
      .filter((i) => i.innerText)
      .filter(
        (i) =>
          i.innerText.includes("Featured snippet from the web") ||
          i.innerText.includes("Description") ||
          i.innerText.includes("Calculator result")
      )
      .slice(-1)?.[0]
      ?.parentElement.querySelector("div span") ||
    //Cards (like at the side)
    d.querySelector(
      ".card-section, [class*='__wholepage-card'] [class*='desc']"
    ) ||
    d.querySelector(".thODed")?.querySelector("div span") ||
    [...d.querySelectorAll("[data-async-token]")]?.slice(-1)?.[0] ||
    d.querySelector("miniapps-card-header")?.parentElement ||
    d.querySelector("#tw-target");
  var text = el?.innerText?.trim();
  if (text.includes("translation") && text.includes("Google Translate")) {
    text = text.split("Verified")[0].trim();
  }
  if (
    text.includes("Calculator Result") &&
    text.includes("Your calculations and results")
  ) {
    text = text
      .split("them")?.[1]
      .split("(function()")?.[0]
      ?.split("=")?.[1]
      ?.trim();
  }
  return text;
}
<input placeholder="What do you want to search?"><button>Search!</button>
Wertz answered 11/9, 2021 at 15:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.