Issue with parsing publication data from PubMed with Entrez
Asked Answered
C

1

6

I am trying to use Entrez to import publication data into a database. The search part works fine, but when I try to parse:

from Bio import Entrez

def create_publication(pmid):

    handle = Entrez.efetch("pubmed", id=pmid, retmode="xml")
    records = Entrez.parse(handle)
    item_data = records.next()
    handle.close()

... I get the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse") ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

This code used to work until a few days ago. Any ideas what might be going wrong here?

Also, looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error:

from Bio import Entrez 
Entrez.email = "[email protected]"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()
Claudineclaudio answered 22/12, 2016 at 15:42 Comment(8)
Silly question, but have you tried using Entrez.read(), and then parsing the results?Sinapism
read() works principally, but there is a whole bunch of other code around this. So when I try, I just keep getting different errors. So either there is a simple fix for parse(), or I need to rewrite the rest.Claudineclaudio
This used to work until three days ago, but it seems something changed at PubMed recently, so it fails now.Claudineclaudio
If it makes you feel any better, I'm getting the same error with the second bit of code you posted. records is a generator object, but I can't seem to read it, so I'm not sure what it contains...Sinapism
Yep, same here. I'll go for read() then. But maybe I try to get in touch with people at NIH who run PubMed. If they made the change deliberately, then it is fine. But it could also be a bug they are not aware of.Claudineclaudio
Looks like biopython devs are awareCitizenry
Thanks Kevin for sharing the link!Claudineclaudio
@Claudineclaudio the GitHub Issue is now closed, just FYIAssentor
A
4

The issue, as documented in other comments and the GitHub Issue, is caused by a deliberate change made by NCBI Entrez Utilities Developers. As documented in this issue by Jhird , you can change your code to the following:

from Bio import Entrez 
Entrez.email = "[email protected]"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")  

records = Entrez.read(handle)      # Difference here
records = records['PubmedArticle'] # New line here  

for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()
Assentor answered 3/2, 2017 at 2:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.