Given a table of citations, how to reverse-lookup the Digital Object Identifier for each of the citations?
Asked Answered
T

4

9

I have a table of citations that includes the last name of the first author, the title, journal, year, and page numbers for each citation.

I have posted the first few lines of the table on Google Docs; it is also available in the form of a CSV file. (Notice that some records do not have a DOI.)

I would like to be able to query the DOI for each of these citations. For the titles, it would be best if the query could handle some form of fuzzy matching.

How can I do this?

The table is currently in MySQL, but it would be sufficient to start and end with a CSV file or, since I mostly use R, an R data frame. (I would appreciate an answer that goes from start to finish.)

Tillford answered 14/3, 2012 at 22:53 Comment(5)
similar to #1170620Tillford
Currently, you are not storing the DOI anywhere so how are you going query against it?Hernandez
@nnichols want to find the doi given the information that I do haveTillford
Sorry, I was obviously being a bit slow :). Looks like querying crossref with mechanize should be a simple solution.Hernandez
#9952240Expletive
T
3

Here are two options

CSV upload

I have found another promising solution that does not work as well in practice as uploading a CSV directly and performing a text query here at http://www.crossref.org/stqUpload/.

However, only 18 of the 250 queries (≈7%) returned a DOI.

XML Query

Based on the answer by Brian Diggs, here is an attempt in the R programming language that does 95% of the work—toward writing the XML-based query. It still has a few bugs that need to be removed using sed. But the biggest problem is the “session timed out” errors I had encountered when the query was submitted.

The XML syntax includes an option to use fuzzy matching.

The doiquery.xml file contains the template text from Brian’s answer; the citations.csv file is linked above.

library(XML)
doiquery.xml <- xmlTreeParse('doiquery.xml')

query <- doiquery.xml$doc$children$query_batch[["body"]]

citations <- read.csv("citations.csv")

new.query <- function(citation, query = query){
  xmlValue(query[["author"]]) <- as.character(citation$author)
  xmlValue(query[["year"]]) <- as.character(citation$year)
  xmlValue(query[["article_title"]][["text"]]) <- citation$title
  xmlValue(query[["journal_title"]]) <- citation$journal
  return(query)
}

for (i in 1:nrow(citations)){
  q <- addChildren(q, add.query(citations[i,]))
}
axml <- addChildren(doiquery.xml$doc$children$query_batch, q )

saveXML(axml, file = 'foo.xml')

CSV to XML Converter

Creativyst software provides a Web-based CSV to XML converter.

The necessary steps to take are as follows.

  1. Enter the column names in the ElementIDs field.
  2. Enter document in the DocID field.
  3. Enter query in RowID field.
  4. Copy and paste the CSV file into the Input CSV file field.
  5. Click Convert.

See also a related question: Shell script to parse CSV to an XML query?

Tillford answered 20/3, 2012 at 22:27 Comment(2)
In case anybody is wondering: the code snippet above is written in the R programming language.Bipolar
@StefanSchmidt thanks - I've clarified my answerTillford
B
5

I don’t know of any complete packages or functions that do this already, but this is the general approach I would use. The Crossref DOI registration agency offers a Web-based approach for determining the DOI from bibliographic data at https://www.crossref.org/guestquery/.

On that page are several different ways to search, including the last one which takes an XML formatted search. The page includes information on how to create the appropriate XML. You would need to the submit the XML over HTTP (determining the details by picking apart the page to figure out form destinations and any additional information that needs to be included), and then parse out the response.

Additionally, you would need to verify that doing this in an automated manner does not violate the terms of service of the website in any way.


Below is the XML form for the Crossref free DOI lookup, where the searchable terms include article_title, author, year, journal_title, volume, and first_page:

<?xml version = "1.0" encoding="UTF-8"?>
<query_batch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0" xmlns="http://www.crossref.org/qschema/2.0"
  xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd">
<head>
   <email_address>[email protected]</email_address>
   <doi_batch_id>test</doi_batch_id>
</head>
<body>
  <query enable-multiple-hits="false|exact|multi_hit_per_rule|one_hit_per_rule|true"
            list-components="false"
            expanded-results="false" key="key">
    <article_title match="fuzzy"></article_title>
    <author search-all-authors="false"></author>
    <component_number></component_number>
    <edition_number></edition_number>
    <institution_name></institution_name>
    <isbn></isbn>
    <issn></issn>
    <volume></volume>
    <issue></issue>
    <year></year>
    <first_page></first_page>
    <journal_title></journal_title>
    <proceedings_title></proceedings_title>
    <series_title></series_title>
    <volume_title></volume_title>
    <unstructured_citation></unstructured_citation>
  </query>
</body>
</query_batch>
Balkin answered 14/3, 2012 at 23:28 Comment(3)
Thanks for your answer. I forgot that my edits would be accepted automatically ... feel free to revert ... but this form substantially boosted my confidence. Based on this, I would be able to fill out one of these forms, but it is not clear how I could submit them.Tillford
actually, now that I realize that you can repeat the <query> tag, this should be easy. I'll post my solution once I figure it outTillford
thanks. I wrote a query with 250 query nodes and the session timed out. i'll have to work on this some more.Tillford
C
5

This is an open problem. There are better and worse ways to attack it. Start by reading Karen Coyle’s summary of the problem. The bibliography at the end of that article is excellent.

In short, the problem of quantifying sameness between two bibliographic records is hard, and a substantial amount of machine-learning research has been done around this topic.

Copaiba answered 27/3, 2012 at 1:7 Comment(0)
T
3

Here are two options

CSV upload

I have found another promising solution that does not work as well in practice as uploading a CSV directly and performing a text query here at http://www.crossref.org/stqUpload/.

However, only 18 of the 250 queries (≈7%) returned a DOI.

XML Query

Based on the answer by Brian Diggs, here is an attempt in the R programming language that does 95% of the work—toward writing the XML-based query. It still has a few bugs that need to be removed using sed. But the biggest problem is the “session timed out” errors I had encountered when the query was submitted.

The XML syntax includes an option to use fuzzy matching.

The doiquery.xml file contains the template text from Brian’s answer; the citations.csv file is linked above.

library(XML)
doiquery.xml <- xmlTreeParse('doiquery.xml')

query <- doiquery.xml$doc$children$query_batch[["body"]]

citations <- read.csv("citations.csv")

new.query <- function(citation, query = query){
  xmlValue(query[["author"]]) <- as.character(citation$author)
  xmlValue(query[["year"]]) <- as.character(citation$year)
  xmlValue(query[["article_title"]][["text"]]) <- citation$title
  xmlValue(query[["journal_title"]]) <- citation$journal
  return(query)
}

for (i in 1:nrow(citations)){
  q <- addChildren(q, add.query(citations[i,]))
}
axml <- addChildren(doiquery.xml$doc$children$query_batch, q )

saveXML(axml, file = 'foo.xml')

CSV to XML Converter

Creativyst software provides a Web-based CSV to XML converter.

The necessary steps to take are as follows.

  1. Enter the column names in the ElementIDs field.
  2. Enter document in the DocID field.
  3. Enter query in RowID field.
  4. Copy and paste the CSV file into the Input CSV file field.
  5. Click Convert.

See also a related question: Shell script to parse CSV to an XML query?

Tillford answered 20/3, 2012 at 22:27 Comment(2)
In case anybody is wondering: the code snippet above is written in the R programming language.Bipolar
@StefanSchmidt thanks - I've clarified my answerTillford
I
1

Reposted from Academia Exchange

Update in 2022: Easiest for me was using the website, where you can just copy and paste your references: https://search.crossref.org/references

I also looked at a few python libraries to interact with the crossref REST API, e.g.,

https://pypi.org/project/habanero/

https://gitlab.com/crossref/crossref_commons_py

The libraries were easy to use in general, but it was not straight forward how to > get the DOI based on a title and there were not really any good examples for this > task.

Isiahisiahi answered 24/5, 2022 at 7:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.