Retrieve citations of a journal paper using R
Asked Answered
C

1

8

Using R, I want to obtain the list of articles referencing to a scientific journal paper.

The only information I have is the title of the article, e.g. "Protein measurement with the folin phenol reagent".

Is anyone able to help me by producing a replicable example that I can use?

Here is what I tried so far.

The R package fulltext seems to be useful, because it allows to retrieve a list of IDs linked to an article. For instance, I can get the article's DOI:

library(fulltext)
res1 <- ft_search(query = "Protein measurement with the folin phenol reagent", from = "crossref")
res1 <- ft_links(res1)
res1$crossref$ids

In the same way, I can get the scopus id, by setting from = "scopus" in the function fulltext::ft_search (and by including a scopus API key).

If using the DOI, I can obtain the number of citations of the article using the R library rcrossref:

rcrossref::cr_citation_count(res1$crossref$ids[1])

Similarly, I can use the R package rscopus if I want to use the scopus id, rather than the DOI.

Unfortunately, this information is not sufficient to me, as I need the list of articles referencing to the paper, not the number.

I saw on the internet many people using the package scholar. But if I understand correctly, for this to work I need article's authors to have a google scholar ID, and I have to find a way to retrieve this ID. So it doesn't look like a viable solution.

Does anyone has any idea on how to solve this problem?

Caban answered 8/3, 2019 at 13:27 Comment(6)
an interesting question. Have you seen RCrawler? Crawling webpages is not a difficult task because you can use xpath/css to extract the data plugged in these codes. However, the full text of journal papers is in pdf format. So you'll need to figure out how to extract the data from an online pdf file.Macaronic
This would imply a number of things: i) find the website of the journal where the article was published, ii) get the article by accessing the journal website, iii) obtain the credentials to access the journal website, iv) download the pdf and crawl it. Not sure this is the right approach, unless I am missing something.Caban
part i, ii are doable by any crawler function. Part iii is doable only if you have the valid credentials to access the website. If not, its hacking and I think a forum like SO is not the right place for it. Part iv, python is your best bet. See this linkMacaronic
@Ashish, this is a specfic question on R, not python. Even if you can use a crawler function, this doesn't look fit for the task. Rather than using standard tools which already aggregate all journal sources (e.g. google scholars), you want to create a new source from scratch. You are making things more complicated then they are.Caban
@Ashish, also please remember this is no place for hackers. On SO, if you say you need credentials, it can only mean to buy a valid access. Since it is hard to expect someone to have valid access to all journal websites, this is why I said the solution it is not a reasonable one. Please avoid wasting time with false allegations and let's stick to the question.Caban
try not to mince my words to suit your needs. Foremost, I did not make any sort of an allegation. Nor, am I making anything complicated. I wrote what seemed correct to me. If you don't like, at least have the courtesy to politely negate it rather than wielding the sword on someone trying to help. I sincerely regret investing my one cent to this question!!!Macaronic
R
6

Once you have the DOI, you can use the OpenCitations API to fetch data about publications that cite the article. Access the API with the rjson-package via https://opencitations.net/index/coci/api/v1/citations/{DOI}. The field name citing contains as values the DOIs of all publications that cite the publication. You can then use CrossRef's API to fetch further metadata about the citing papers, such as titles, journal, publication date and authors (via https://api.crossref.org/works/{DOI}).

Here is an example of OpenCitations' API with three citations (as of January 2021).


Here is a possible code (with the same example as above):

opcit <- "https://opencitations.net/index/coci/api/v1/citations/10.1177/1369148118786043"

result <- rjson::fromJSON(file = opcit)

citing <- lapply(result, function(x){
  x[['citing']]
})
# a vector with three DOIs, each of which cite the paper
citing <- unlist(citing) 

Now we have the vector citing with three DOIs. You can then use rcrossref to find out basic information about the citing papers, such as:

paper <- rcrossref::cr_works(citing[1])

# find out the title of that paper
paper[["data"]][["title"]]

# output: "Exchange diplomacy: theory, policy and practice in the Fulbright program"

Since you have a vector of DOIs in citing, you could also use this approach:

citingdata <- rcrossref::cr_cn(citing)

The output of citingdata should lead to the metadata of the three citing papers, structured like in these two examples:

[[1]]
[1] "@article{Wong_2020,\n\tdoi = {10.1017/s1752971920000196},\n\turl = {https://doi.org/10.1017%2Fs1752971920000196},\n\tyear = 2020,\n\tmonth = {jun},\n\tpublisher = {Cambridge University Press ({CUP})},\n\tpages = {1--31},\n\tauthor = {Seanon S. Wong},\n\ttitle = {One-upmanship and putdowns: the aggressive use of interaction rituals in face-to-face diplomacy},\n\tjournal = {International Theory}\n}"

[[2]]
[1] "@article{Aalberts_2020,\n\tdoi = {10.1080/21624887.2020.1792734},\n\turl = {https://doi.org/10.1080%2F21624887.2020.1792734},\n\tyear = 2020,\n\tmonth = {aug},\n\tpublisher = {Informa {UK} Limited},\n\tvolume = {8},\n\tnumber = {3},\n\tpages = {240--264},\n\tauthor = {Tanja Aalberts and Xymena Kurowska and Anna Leander and Maria Mälksoo and Charlotte Heath-Kelly and Luisa Lobato and Ted Svensson},\n\ttitle = {Rituals of world politics: on (visual) practices disordering things},\n\tjournal = {Critical Studies on Security}\n}"
Ries answered 1/1, 2021 at 20:49 Comment(2)
Thank you for your contribution. If you provide a workable example I will gladly accept your answer as correct.Caban
I added an example!Ries

© 2022 - 2024 — McMap. All rights reserved.