lxml: add namespace to input file
Asked Answered
H

6

18

I am parsing an xml file generated by an external program. I would then like to add custom annotations to this file, using my own namespace. My input looks as below:

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
  <model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
</sbml>

The issue being that lxml only declares namespaces when they are used, which means the declaration is repeated many times, like so (simplified):

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4">
  <listOfSpecies>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
    </species>
    ....
  </listOfSpecies>
</sbml>

Is it possible to force lxml to write this declaration only once in a parent element, such as sbml or listOfSpecies? Or is there a good reason not to do so? The result I want would be:

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"  xmlns:kjw="http://this.is.some/custom_namespace">
  <listOfSpecies>
    <species>
      <kjw:test/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test/>
    </species>
    ....
  </listOfSpecies>
</sbml>

The important problem is that the existing data which is read from a file must be kept, so I cannot just make a new root element (I think?).

EDIT: Code attached below.

def annotateSbml(sbml_input):
  from lxml import etree

  checkSbml(sbml_input) # Makes sure the input is valid sbml/xml.

  ns = "http://this.is.some/custom_namespace"
  etree.register_namespace('kjw', ns)

  sbml_doc = etree.ElementTree()
  root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True))
  nsmap = root.nsmap
  nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this?
  nsmap['kjw'] = ns
  ns = '{' + ns + '}'
  sbmlns = '{' + nsmap['sbml'] + '}'

  for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap):
    species.append(etree.Element(ns + 'test'))

  sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True)

  return
Hurried answered 5/7, 2012 at 14:30 Comment(3)
@Marcin: done. Any tips?Hurried
@mzjin my input is contains everything except the <kjw:test/> tags. The aim is to insert such tags (or similar, e.g. kjw:score or kjw:length) to each species in this list. Does this make sense, or should I post the whole file (figured my original question was long enough as it is)?Hurried
@mzjin Ah sorry, oversimplified that a bit. Yes it does indeed contain model tags. I've used the sbml:model tags together with nsmap['sbml'] = nsmap[None] so the parser properly substitutes the namespace in model with the root namespace, which it doesn't seem to otherwise.Hurried
D
13

Modifying the namespace mapping of a node is not possible in lxml. See this open ticket that has this feature as a wishlist item.

It originated from this thread on the lxml mailing list, where a workaround replacing the root node is given as an alternative. There are some issues with replacing the root node though: see the ticket above.

I'll put the suggested root replacement workaround code here for completeness:

>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
...   <model metaid="untitled" id="untitled">
...     <annotation>...</annotation>
...     <listOfUnitDefinitions>...</listOfUnitDefinitions>
...     <listOfCompartments>...</listOfCompartments>
...     <listOfSpecies>
...       <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
...         <annotation>
...           <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...       <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
...         <annotation>
...            <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...     </listOfSpecies>
...     <listOfReactions>...</listOfReactions>
...   </model>
... </sbml>"""
>>> 
>>> from lxml import etree
>>> from StringIO import StringIO
>>> NS = "http://this.is.some/custom_namespace"
>>> tree = etree.ElementTree(element=None, file=StringIO(DOC))
>>> root = tree.getroot()
>>> nsmap = root.nsmap
>>> nsmap['kjw'] = NS
>>> new_root = etree.Element(root.tag, nsmap=nsmap)
>>> new_root[:] = root[:]
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))

>>> print etree.tostring(new_root, pretty_print=True)
<sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
<kjw:test/><kjw:test/></sbml>
Dunaj answered 5/7, 2012 at 18:8 Comment(2)
For future reference this requires a small alteration (on Python 3.2 at least), otherwise gives a TypeError from **root.nsmap when it hits the None:'namespace' as None is not a string. Using nsmap = root.nsmap; nsmap['kjw'] = NS; new_root = etree.Element(root.tag, nsmap = nsmap); works.Hurried
you also need to copy attrib, text, and (unlikely, but just for completness) tail. nsmap=dict(kjw=NS, nsmap=nsmap)) is wrong; it should be just nsmap=nsmapEley
W
8

I know this is old question, but it still valid and as of lxml 3.5.0, there is probably better solution to this problem:

cleanup_namespaces() accepts a new argument top_nsmap that moves definitions of the provided prefix-namespace mapping to the top of the tree.

So now the namespace map can be moved up with simple call to this:

nsmap = {'kjw': 'http://this.is.some/custom_namespace'}
etree.cleanup_namespaces(root, top_nsmap=nsmap)
Wicks answered 26/7, 2016 at 8:33 Comment(0)
A
3

Rather than dealing directly with the raw XML you could also look toward LibSBML, a library for manipulating SBML documents with language bindings for, among others, python. There you would use it like this:

>>> from libsbml import *
>>> doc = readSBML('Dropbox/SBML Models/BorisEJB.xml')
>>> species = doc.getModel().getSpecies('MAPK')
>>> species.appendAnnotation('<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>')
0
>>> species.toSBML()
'<species id="MAPK" compartment="compartment" initialConcentration="280" boundaryCondition="false">\n  <annotation>\n
 <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>\n  </annotation>\n</species>'
>>>

Adjective answered 6/7, 2012 at 7:2 Comment(0)
D
1

If you temporarily add a namespaced attribute to the root node, that does the trick.

ns = '{http://this.is.some/custom_namespace}'

# add 'kjw:foobar' attribute to root node
root.set(ns+'foobar', 'foobar')

# add kjw namespace elements (or attributes) elsewhere
... get child element species ...
species.append(etree.Element(ns + 'test'))

# remove temporary namespaced attribute from root node
del root.attrib[ns+'foobar']
Denial answered 18/6, 2013 at 4:36 Comment(0)
H
1

I wrote this function to add a namespace to the root element:

def addns(tree, alias, uri):                
    root = tree.getroot()
    nsmap = root.nsmap
    nsmap[alias] = uri
    new_root = etree.Element(root.tag, attrib=root.attrib, nsmap=nsmap)
    new_root[:] = root[:]
    return new_root.getroottree()

After applying this function, you get a new tree, but you can probably change the tree instance from the single objet from which you access the tree ... as you have a strong OO design!.

Hachmin answered 6/2, 2020 at 22:4 Comment(0)
E
0

You could replace the root element to add 'kjw' to its nsmap. Then xmlns declaration would be only in the root element.

Eley answered 5/7, 2012 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.