Update: this feature is now supported by pybtex since version 0.20.
It does not at the moment. But you can read the bib file using a latex codec before you process it with pybtex, e.g. with https://pypi.python.org/pypi/latexcodec/ This codec will convert (a wide range of) LaTeX commands to unicode for you.
However, you'll have to remove brackets in a post-processing stage. Why? In order to handle bibtex code gracefully, \"{U}
has to be converted into {Ü}
rather than into Ü
to prevent it from being lower cased in titles. The following example demonstrates this behaviour:
import pybtex.database.input.bibtex
import pybtex.plugin
import codecs
import latexcodec
style = pybtex.plugin.find_plugin('pybtex.style.formatting', 'plain')()
backend = pybtex.plugin.find_plugin('pybtex.backends', 'latex')()
parser = pybtex.database.input.bibtex.Parser()
with codecs.open("test.bib", encoding="latex") as stream:
# this shows what the latexcodec does to the source
print stream.read()
with codecs.open("test.bib", encoding="latex") as stream:
data = parser.parse_stream(stream)
for entry in style.format_entries(data.entries.itervalues()):
print entry.text.render(backend)
where test.bib is
@Article{test,
author = {John Doe},
title = {Testing \"UTEST \"{U}TEST},
journal = {Journal of Test},
year = {2000},
}
This will print how the latexcodec converted test.bib into unicode (edited for readability):
@Article{test,
author = {John Doe}, title = {Testing ÜTEST {Ü}TEST},
journal = {Journal of Test}, year = {2000},
}
followed by the pybtex rendered entry (in this case, producing latex code):
John Doe.
\newblock Testing ütest {Ü}test.
\newblock \emph{Journal of Test}, 2000.
If the codec were to strip the brackets, pybtex would have converted the case wrongly. Further, in (pathological) cases like journal = {\"u}
clearly the brackets cannot be removed either.
An obvious downside is that if you render to a non-LaTeX backend, then you have to remove the brackets in a post-processing stage. But you may want to do that anyway to process any special LaTeX commands (such as \url
). It would be nice if pybtex could somehow do that for you, but it doesn't at the moment.
"Heged\"{u}s".decode("latex")
and it returnsHeged{ü}s
instead ofHegedüs
. Kinda confusing to me now. – Wimer