Convert BibTex file to database entries using Python
Asked Answered
B

5

17

Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).

After doing some initial research, I found that there exists Bibutils which I could use to convert a bib file to xml. My initial idea was to convert it to XML and then parse the XML in python to populate a dictionary.

My main questions are:

  1. Is there a better way I could do this conversion?
  2. Is there a library which directly parses a bibTex and gives me the fields in python?

(I did find bibliography.parsing, which uses bibutils internally but there is not much documentation on it and am finding it tough to get it to work).

Bestiality answered 10/2, 2012 at 22:43 Comment(1)
Ask at tex.stackexchange.comDesert
S
26

Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:

from pybtex.database.input import bibtex

#open a bibtex file
parser = bibtex.Parser()
bibdata = parser.parse_file("myrefs.bib")

#loop through the individual references
for bib_id in bibdata.entries:
    b = bibdata.entries[bib_id].fields
    try:
        # change these lines to create a SQL insert
        print b["title"]
        print b["journal"]
        print b["year"]
        #deal with multiple authors
        for author in bibdata.entries[bib_id].persons["author"]:
            print author.first(), author.last()
    # field may not exist for a reference
    except(KeyError):
        continue
Sarver answered 27/12, 2012 at 22:0 Comment(0)
L
5

My workaround is to use bibtexparser to export relevant fields to a csv file;

import bibtexparser
import pandas as pd

with open("../../bib/small.bib") as bibtex_file:
    bib_database = bibtexparser.load(bibtex_file)
    
df = pd.DataFrame(bib_database.entries)
selection = df[['doi', 'number']]
selection.to_csv('temp.csv', index=False)

And then write the csv to a table in the database, and delete the temp.csv.

This avoids some complication with pybtex I found.

Luminescent answered 2/12, 2020 at 14:18 Comment(0)
H
3

You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser

Documentation: https://bibtexparser.readthedocs.org

It's very straight forward (I use it in production).

For the record, I am not the developer of this library.

Hickory answered 28/1, 2014 at 18:31 Comment(1)
The main repo is here: github.com/sciunto-org/python-bibtexparser.Morel
F
2

Converting to XML is a fine idea.

XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).

So far as I know, there is no direct, mature bibTeX reader for Python.

Frigid answered 10/2, 2012 at 22:46 Comment(0)
D
1

You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql tool that generates a SQL database from a BibTeX database, with the following schema:

enter image description here

An alternative tool: bibsql and bibtosql.

Then you can feed it to your schema by writing some SQL conversion queries.

Dualism answered 17/10, 2015 at 3:34 Comment(1)
after a long time. I am really interested in these chart you did here. Please can you share with me how you created it?Paramorph

© 2022 - 2024 — McMap. All rights reserved.