While I would generally agree with Nikita that any particular CRF toolset isn't the source of the low accuracy, and that it is a solutions approach issue. I'm not sure that the two-staged approach, while very accurate and effective when complete, demonstrated by Park,et al. is a practical approach to your problem.
For One, the "two-stages" referred to in the paper are a paired SVM / CRF that are not that easy to setup on the fly if this not your main area of study. They each involve training on labelled data, and a degree of tuning.
Two, it is unlikely that your actual set of data (based on your description above) is as differentially structured as this particular solution was designed to cope with while still maintaining high accuracy. In which case this level of supervised learning is not necessary.
If I may propose a domain specific solution with many of the same features that should be far easier to implement in whatever tool you're using, I would try a (restricted) semantic tree approach, that is semi-supervised, specifically exception(error) advised.
Instead of an english sentence as your data molecule, you have a bibliographic entry. The parts of this molecule that must be there are the author part, the title part, the date part , and the publisher part, there may also be other data parts (page number, Vol. Id, etc.).
As some of these parts may be nested (e.g. page # in publisher part) inside one another or in a varied order of arrangement, but still operationally valid, it's a good indicator for use of semantic trees.
Further still, the fact that each area although variable has unique characteristics: author part (personal names formats e.g. Blow,J. or James,et all, etc.) ; title part (quoted, or italicized, has standard sentence structure); date part (date formats, enclosed in (), etc.), means you need less overall training than for tokenized and unstructured analysis. In the the end this less learning for your program.
Additionally there are structural relations that may be learned to improve accuracy for example: date part (often at the end or separating key sections), author part (often at the beginning, or else after the title), etc. This is further supported by the fact that many associations and publisher have their way of formatting such references, these can be easily learned by relation without much training data.
So to sum up by segmenting the parts and doing structured learning you are reducing the pattern matching in each sub-part and the learning is relegated to relational patterns, which are more reliable, as that is how we construct such entries as humans.
Also there's a ton of tools for this sort of domain specific semantical learning
http://www.semantic-measures-library.org/
http://wiki.opensemanticframework.org/index.php/Ontology_Tools
Hope that helps :)