How to retrieve/calculate citation counts and/or citation indices from a list of authors?
Asked Answered
K

1

20

I have a list of authors. I wish to automatically retrieve/calculate the (ideally yearly) citation index (h-index, m-quotient,g-index, HCP indicator or ...) for each author.

Author Year Index
first  2000   1
first  2001   2
first  2002   3

I can calculate all of these metrics given the citation counts for each paper of each researcher.

Author Paper Year Citation_count
first    1    2000   1
first    2    2000   2
first    3    2002   3

Despite my efforts, I have not found an API/scraping method capable of this.

My institution has access to a number of services including Web of Science.

Kwan answered 10/5, 2012 at 14:50 Comment(8)
bmb-common.blogspot.ca/2011/11/google-scholar-still-sucks.html has some information -- in particular, the CITAN package looks quite powerful if you have access to Scopus; there have also been some recent PubMed-scraping posts on r-bloggers (whether this works for you or not depends on whether you are happy with PubMed coverage in your field). Even if you could scrape WoS, it's not permitted by their terms of service ...Lysol
@Ben Bolker, Thank you for the suggestions, this does point me in the right direction.Mystify
This is probably where a solution will be created: ropensci.org/project-overviewMystify
github.com/ropensci/raltmet/blob/master/R/citedin.rMystify
All useful information, thanks for digging it out (if you put together an answer from these bits and pieces it would be great to post it here as an answer to your question). Still very much restricted by the data sources (e.g. PubMed), but things are developing in a useful way.Lysol
simplystatistics.tumblr.com/post/13203811645/…Mystify
Jeff Leek, Roger Peng, and Rafa Irizarry produced functions to tie in to google scholar. simplystatistics.tumblr.com/post/13203811645/…Mystify
those are nice, but note that they tie into Google scholar citations -- i.e. into the page you can pull up with your own citation report, not a general purpose search (I think)Lysol
E
1

Effectively the main problem is to build the citation graph. Once you have that you can compute any metrics you want (e.g. h-index, g-index, PageRank).

Supposing you have a collections of papers (that you've retrieved in some way) you can extract the citations from each of them and build the citation graph. You might find useful ParsCit, an open-source CRF Reference String and Logical Document Structure Parsing Package which is also used by CiteSeerX and works great.

Ephemera answered 5/8, 2012 at 17:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.