How to automatically turn BibTex citation into something parseable by Zotero?

B

3

6

I have a citation system which publishes users notes to a wiki (Researchr). Programmatically, I have access to the full BibTeX record of each entry, and I also display this on the individual pages (for example - click on BibTeX). This is in the interest of making it easy for users of other citation manager to automatically import the citation of a paper that interests them. I would also like other citation managers, especially Zotero, to be able to automatically detect and import a citation.

Zotero lists a number of ways of exposing metadata that it will understand, including meta tags with RDF, COiNS, Dublin Core and unAPI. Is there a Ruby library for converting BibTeX to any of these standards automatically - or a Javascript library? I could probably create something, but if something existed, it would be far more robust (BibTeX has so many publication types and fields etc).

Biyearly answered 6/3, 2012 at 0:52 Comment(0)

M

2

There's a BibTeX2RDF convertor available here, might be what you're after.

Mellicent answered 8/3, 2012 at 2:0 Comment(0)

C

1

unAPI is not a data standard - it's a way to serve data (to Zotero and other programs). Zotero imports Bibtex, so serving Bibtex via unAPI works just fine. Inspire is an example of a site that does that: http://inspirehep.net/

Carbide answered 6/3, 2012 at 1:46 Comment(3)

I understand that I can serve bibtex via unAPI, but have no idea about how to implement that in DokuWiki. It seems unnecessarily complicated, would rather just implement some metadata in the page, rather than having to implement another HTTP response. – Morrow 6/3, 2012 at 3:35

OK, understood. I know no scripted solution - Zotero itself can import bibtex and output COinS, but that's probably too clumsy. Bibutils sourceforge.net/p/bibutils/home/Bibutils gets you half way there, converting e.g. bibtex to MODS - maybe you can use that to get one of the other formats. – Carbide 6/3, 2012 at 6:49

One approach that might be interesting would be to write a CSL definition to automatically generate a citation in COiNS, MODS, Dublin Core, etc. This would then be useable by js-citeproc, ruby-citeproc, python-citeproc and any other tool that uses the CSL export templates. – Morrow 6/3, 2012 at 13:52

C

0

By now one can simply import bibtex files of type .bib directly in Zotero. However, I noticed my bibtex files were often less complete than Zotero (in particular they often missed a DOI), and I did not find an "auto-complete" function (based on the data in the bibtex entries) in Zotero.

So I import the .bib file with Zotero, to ensure they are all in there. Then I run a python script that gets all the missing DOI's it can find for the entries in that .bib file, and exports them to a space separated .txt file.:

# pip install habanero
from habanero import Crossref
import re


def titletodoi(keyword):
    cr = Crossref()
    result = cr.works(query=keyword)
    items = result["message"]["items"]
    item_title = items[0]["title"]
    tmp = ""
    for it in item_title:
        tmp += it
    title = keyword.replace(" ", "").lower()
    title = re.sub(r"\W", "", title)
    # print('title: ' + title)
    tmp = tmp.replace(" ", "").lower()
    tmp = re.sub(r"\W", "", tmp)
    # print('tmp: ' + tmp)
    if title == tmp:
        doi = items[0]["DOI"]
        return doi
    else:
        return None


def get_dois(titles):
    dois = []
    for title in titles:
        try:
            doi = titletodoi(title)
            print(f"doi={doi}, title={title}")
            if not doi is None:
                dois.append(doi)
        except:
            pass
            # print("An exception occurred")
    print(f"dois={dois}")
    return dois


def read_titles_from_file(filepath):
    with open(filepath) as f:
        lines = f.read().splitlines()
    split_lines = splits_lines(lines)
    return split_lines


def splits_lines(lines):
    split_lines = []
    for line in lines:
        new_lines = line.split(";")
        for new_line in new_lines:
            split_lines.append(new_line)
    return split_lines


def write_dois_to_file(dois, filename, separation_char):
    textfile = open(filename, "w")
    for doi in dois:
        textfile.write(doi + separation_char)
    textfile.close()


filepath = "list_of_titles.txt"
titles = read_titles_from_file(filepath)
dois = get_dois(titles)
write_dois_to_file(dois, "dois_space.txt", " ")
write_dois_to_file(dois, "dois_per_line.txt", "\n")

The DOIs of the .txt are fed into magic wand of Zotero. Next, I (manually) remove the duplicates by choosing the latest added entry (because that comes from the magic wand with the most data).

After that, I run another script to update all the reference id's in my .tex and .bib files to those generated by Zotero:

# Importing library
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import *
import os, fnmatch

import Levenshtein as lev


# Let's define a function to customize our entries.
# It takes a record and return this record.
def customizations(record):
    """Use some functions delivered by the library

    :param record: a record
    :returns: -- customized record
    """
    record = type(record)
    record = author(record)
    record = editor(record)
    record = journal(record)
    record = keyword(record)
    record = link(record)
    record = page_double_hyphen(record)
    record = doi(record)
    return record


def get_references(filepath):
    with open(filepath) as bibtex_file:
        parser = BibTexParser()
        parser.customization = customizations
        bib_database = bibtexparser.load(bibtex_file, parser=parser)
        # print(bib_database.entries)
    return bib_database


def get_reference_mapping(main_filepath, sub_filepath):
    found_sub = []
    found_main = []
    main_into_sub = []

    main_references = get_references(main_filepath)
    sub_references = get_references(sub_filepath)

    for main_entry in main_references.entries:
        for sub_entry in sub_references.entries:

            # Match the reference ID if 85% similair titles are detected
            lev_ratio = lev.ratio(
                remove_curly_braces(main_entry["title"]).lower(),
                remove_curly_braces(sub_entry["title"]).lower(),
            )
            if lev_ratio > 0.85:
                print(f"lev_ratio={lev_ratio}")

                if main_entry["ID"] != sub_entry["ID"]:
                    print(f'replace: {sub_entry["ID"]} with: {main_entry["ID"]}')
                    main_into_sub.append([main_entry, sub_entry])

                    # Keep track of which entries have been found
                    found_sub.append(sub_entry)
                    found_main.append(main_entry)
    return (
        main_into_sub,
        found_main,
        found_sub,
        main_references.entries,
        sub_references.entries,
    )


def remove_curly_braces(string):
    left = string.replace("{", "")
    right = left.replace("{", "")
    return right


def replace_references(main_into_sub, directory):
    for pair in main_into_sub:
        main = pair[0]["ID"]
        sub = pair[1]["ID"]
        print(f"replace: {sub} with: {main}")

        # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
        # findReplace(latex_root_dir, sub, main, "*.tex")
        # findReplace(latex_root_dir, sub, main, "*.bib")


def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)


def list_missing(main_references, sub_references):
    for sub in sub_references:
        if not sub["ID"] in list(map(lambda x: x["ID"], main_references)):
            print(f'the following reference has a changed title:{sub["ID"]}')


latex_root_dir = "some_path/"
main_filepath = f"{latex_root_dir}latex/Literature_study/zotero.bib"
sub_filepath = f"{latex_root_dir}latex/Literature_study/references.bib"
(
    main_into_sub,
    found_main,
    found_sub,
    main_references,
    sub_references,
) = get_reference_mapping(main_filepath, sub_filepath)
replace_references(main_into_sub, latex_root_dir)
list_missing(main_references, sub_references)


# For those references which have levenshtein ratio below 85 you can specify a manual swap:
manual_swap = []  # main into sub
# manual_swap.append(["cantley_impact_2021","cantley2021impact"])
# manual_swap.append(["widemann_envision_2021","widemann2020envision"])
for pair in manual_swap:
    main = pair[0]
    sub = pair[1]
    print(f"replace: {sub} with: {main}")

    # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
    # findReplace(latex_root_dir, sub, main, "*.tex")
    # findReplace(latex_root_dir, sub, main, "*.bib")

Czarevitch answered 1/11, 2021 at 23:37 Comment(0)

Recommended topics

Hot tags