Does Google Scholar have an API available that we can use in our research applications?
Asked Answered
M

2

39

I am working on a research publication and collaboration project that has a literature search feature in it. Google Scholar seems like it will work since it is an open source tool, but when I researched Google Scholar, I could not find any information about it having an API.

Is there an API for Google Scholar?

Marsha answered 16/7, 2020 at 15:27 Comment(0)
C
10

A quick search shows that others are trying to implement such APIs, but Google does not provide one. It is not clear whether this is legal, see for instance How to get permission from Google to use Google Scholar Data, if needed?.

Cascarilla answered 16/7, 2020 at 15:33 Comment(0)
L
30

There's no official Google Scholar API.

There are third-party solutions like free scholarly Python package which supports profile, author, cite and organic results (search_pubs seems to be the method to get organic results, although method name confuses me).

Note that by using scholarly constantly without a requests rate limit, Google may block your IP (mentioned by @RadioControlled). Use it wisely.

Additionally, there's a scrape-google-scholar-py module that lets you extract pretty much any Google Scholar page.

Alternatively, there's a Google Scholar API from SerpApi which is a paid API with a free plan that supports organic, cite, profile, author results and bypasses all the blocks on SerpApi backend so it won't block your IP, and it handle legal part of scraping.


Example code to parse profile results using scholarly using search_by_keyword method:

import json
from scholarly import scholarly

# will paginate to the next page by default
authors = scholarly.search_keyword("biology")

for author in authors:
    print(json.dumps(author, indent=2))

# part of the output:

'''
{
  "container_type": "Author",
  "filled": [],
  "source": "SEARCH_AUTHOR_SNIPPETS",
  "scholar_id": "LXVfPc8AAAAJ",
  "url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
  "name": "Eric Lander",
  "affiliation": "Broad Institute",
  "email_domain": "",
  "interests": [
    "Biology",
    "Genomics",
    "Genetics",
    "Bioinformatics",
    "Mathematics"
  ],
  "citedby": 552013
}
... other author results
'''

Example using scrape-google-scholar-py:

from google_scholar_py import CustomGoogleScholarProfiles
import json

parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
    query='blizzard',
    pagination=False,
    save_to_csv=False,
    save_to_json=False
)
print(json.dumps(data, indent=2))

Outputs:

[
  {
    "name": "Adam Lobel",
    "link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Gaming",
      "Emotion regulation"
    ],
    "email": "Verified email at AdamLobel.com",
    "cited_by_count": 3593
  }, # other results...
]

Example code to parse organic results using Google Scholar Profile Results API from SerpApi:

import json
from serpapi import GoogleScholarSearch

# search parameters
params = {
    "api_key": "Your SerpApi API key",
    "engine": "google_scholar_profiles",
    "hl": "en",                            # language
    "mauthors": "biology"                  # search query
}

search = GoogleScholarSearch(params)
results = search.get_dict()

# only first page results
for result in results["profiles"]:
    print(json.dumps(result, indent=2))

# part of the output:
'''
{
  "name": "Masatoshi Nei",
  "link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
  "serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
  "author_id": "VxOmZDgAAAAJ",
  "affiliations": "Laura Carnell Professor of Biology, Temple University",
  "email": "Verified email at temple.edu",
  "cited_by": 384074,
  "interests": [
    {
      "title": "Evolution",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
    },
    {
      "title": "Evolutionary biology",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
    },
    {
      "title": "Molecular evolution",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
    },
    {
      "title": "Population genetics",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
    },
    {
      "title": "Phylogenetics",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
    }
  ],
  "thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''

There is a dedicated Scrape historic Google Scholar results using Python blog post of mine at SerpApi which shows how to scrape historic 2017-2021 Organic, Cite Google Scholar results to CSV, SQLite.

There's also a blog post about scraping Google Scholar in R, if you are not a Python guy.

Disclaimer, I work for SeprApi

Lowder answered 23/2, 2022 at 11:48 Comment(2)
Note that if you use scholarly too much you might easily have your whole organization blocked from Google Search for a while (or everyone has to enter captchas for their searches). Not recommended.Anthesis
@RadioControlled thank you for explicitly mentioning it as I forgot to add it when writing this answer. I've updated the answer and added a link to your comment. Thank you 👍Lowder
C
10

A quick search shows that others are trying to implement such APIs, but Google does not provide one. It is not clear whether this is legal, see for instance How to get permission from Google to use Google Scholar Data, if needed?.

Cascarilla answered 16/7, 2020 at 15:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.