There's no official Google Scholar API.
There are third-party solutions like free scholarly
Python package which supports profile, author, cite and organic results (search_pubs
seems to be the method to get organic results, although method name confuses me).
Note that by using scholarly
constantly without a requests rate limit, Google may block your IP (mentioned by @RadioControlled). Use it wisely.
Additionally, there's a scrape-google-scholar-py
module that lets you extract pretty much any Google Scholar page.
Alternatively, there's a Google Scholar API from SerpApi which is a paid API with a free plan that supports organic, cite, profile, author results and bypasses all the blocks on SerpApi backend so it won't block your IP, and it handle legal part of scraping.
Example code to parse profile results using scholarly
using search_by_keyword
method:
import json
from scholarly import scholarly
# will paginate to the next page by default
authors = scholarly.search_keyword("biology")
for author in authors:
print(json.dumps(author, indent=2))
# part of the output:
'''
{
"container_type": "Author",
"filled": [],
"source": "SEARCH_AUTHOR_SNIPPETS",
"scholar_id": "LXVfPc8AAAAJ",
"url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
"name": "Eric Lander",
"affiliation": "Broad Institute",
"email_domain": "",
"interests": [
"Biology",
"Genomics",
"Genetics",
"Bioinformatics",
"Mathematics"
],
"citedby": 552013
}
... other author results
'''
Example using scrape-google-scholar-py
:
from google_scholar_py import CustomGoogleScholarProfiles
import json
parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
query='blizzard',
pagination=False,
save_to_csv=False,
save_to_json=False
)
print(json.dumps(data, indent=2))
Outputs:
[
{
"name": "Adam Lobel",
"link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
"affiliations": "Blizzard Entertainment",
"interests": [
"Gaming",
"Emotion regulation"
],
"email": "Verified email at AdamLobel.com",
"cited_by_count": 3593
}, # other results...
]
Example code to parse organic results using Google Scholar Profile Results API from SerpApi:
import json
from serpapi import GoogleScholarSearch
# search parameters
params = {
"api_key": "Your SerpApi API key",
"engine": "google_scholar_profiles",
"hl": "en", # language
"mauthors": "biology" # search query
}
search = GoogleScholarSearch(params)
results = search.get_dict()
# only first page results
for result in results["profiles"]:
print(json.dumps(result, indent=2))
# part of the output:
'''
{
"name": "Masatoshi Nei",
"link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "VxOmZDgAAAAJ",
"affiliations": "Laura Carnell Professor of Biology, Temple University",
"email": "Verified email at temple.edu",
"cited_by": 384074,
"interests": [
{
"title": "Evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
},
{
"title": "Evolutionary biology",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
},
{
"title": "Molecular evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
},
{
"title": "Population genetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
},
{
"title": "Phylogenetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''
There is a dedicated Scrape historic Google Scholar results using Python blog post of mine at SerpApi which shows how to scrape historic 2017-2021 Organic, Cite Google Scholar results to CSV, SQLite.
There's also a blog post about scraping Google Scholar in R, if you are not a Python guy.
Disclaimer, I work for SeprApi