How to preserve XML nodes that are not bound to an object when using SAX for parsing
Asked Answered
V

4

6

I am working on an android app which interfaces with a bluetooth camera. For each clip stored on the camera we store some fields about the clip (some of which the user can change) in an XML file.

Currently this app is the only app writing this xml data to the device but in the future it is possible a desktop app or an iphone app may write data here too. I don't want to make an assumption that another app couldn't have additional fields as well (especially if they had a newer version of the app which added new fields this version didn't support yet).

So what I want to prevent is a situation where we add new fields to this XML file in another application, and then the user goes to use the android app and its wipes out those other fields because it doesn't know about them.

So lets take hypothetical example:

<data>
  <title>My Title</title>
  <date>12/24/2012</date>
  <category>Blah</category>
</data>

When read from the device this would get translated to a Clip object that looks like this (simplified for brevity)

public class Clip {
  public String title, category;
  public Date date;
}

So I'm using SAX to parse the data and store it to a Clip. I simply store the characters in StringBuilder and write them out when I reach the end element for title,category and date.

I realized though that when I write this data back to the device, if there were any other tags in the original document they would not get written because I only write out the fields I know about.

This makes me think that maybe SAX is the wrong option and perhaps I should use DOM or something else where I could more easily write out any other elements that existed originally.

Alternatively I was thinking maybe my Clip class contains an ArrayList of some generic XML type (maybe DOM), and in startTag I check if the element is not one of the predefined tags, and if so, until I reach the end of that tag I store the whole structure (but in what?).. Then upon writing back out I would just go through all of the additional tags and write them out to the xml file (along with the fields I know about of course)

Is this a common problem with a good known solution?

-- Update 5/22/12 --

I didn't mention that in the actual xml the root node (Actually called annotation), we use a version number which has been set to 1. What I'm going to do for the short term is require that the version number my app supports is >= what the version number is of the xml data. If the xml is a greater number I will attempt to parse for reading back but will deny any saves to the model. I'm still interested in any kind of working example though on how to do this.

BTW I thought of another solution that should be pretty easy. I figure I can use XPATH to find nodes that I know about and replace the content for those nodes when the data is updated. However I ran some benchmarks and the overhead is absurd in parsing the xml when it is parsed into memory. Just the parsing operation without even doing any lookups resulted in performance being 20 times worse than SAX.. Using xpath was between 30-50 times slower in general for parsing, which was really bad considering I parse these in a list view. So my idea is to keep the SAX to parse the nodes to clips, but store the entirety of the XML in an variable of the Clip class (remember, this xml is short, less than 2kb). Then when I go to write the data back out I could use XPATH to replace out the nodes that I know about in the original XML.

Still interested in any other solutions though. I probably won't accept a solution though unless it includes some code examples.

Varia answered 18/5, 2012 at 7:42 Comment(2)
I updated my answer with the implementation that stays with the SAX model and uses XMLFilters. It adds a slight overhead of keeping the recording of the events in memory (very simple object model though). see if you like that oneMellott
Very good solution! I don't think the overhead would be much considering in most cases there will not be any extra nodes.Varia
M
1

Here's how you can go about it with SAX filters:

  1. When you read your document with SAX you record all the events. You record them and bubble them up further to the next level of SAX reader. You basically stack together two layers of SAX readers (with XMLFilter) - one will record and relay, and the other one is your current SAX handler that creates objects.
  2. When you're ready to write your modifications back to disk you fire up the recorded SAX events layered with your writer that would overwrite those values/nodes you have altered.

I spent some time with the idea and it worked. It basically came down to proper chaining of XMLFilters. Here's how the unit test looks like, your code would do something similar:

final SAXParserFactory factory = SAXParserFactory.newInstance();
final SAXParser parser = factory.newSAXParser();

final RecorderProxy recorder = new RecorderProxy(parser.getXMLReader());
final ClipHolder clipHolder = new ClipHolder(recorder);

clipHolder.parse(new InputSource(new StringReader(srcXml)));

assertTrue(recorder.hasRecordingToReplay());

final Clip clip = clipHolder.getClip();
assertNotNull(clip);
assertEquals(clip.title, "My Title");
assertEquals(clip.category, "Blah!");
assertEquals(clip.date, Clip.DATE_FORMAT.parse("12/24/2012"));

clip.title = "My Title Updated";
clip.category = "Something else";

final ClipSerializer serializer = new ClipSerializer(recorder);
serializer.setClip(clip);

final TransformerFactory xsltFactory = TransformerFactory.newInstance();
final Transformer t = xsltFactory.newTransformer();
final StringWriter outXmlBuffer = new StringWriter();

t.transform(new SAXSource(serializer, 
            new InputSource()), new StreamResult(outXmlBuffer));

assertEquals(targetXml, outXmlBuffer.getBuffer().toString());

The important lines are:

  • your SAX events recorder is wrapped around the SAX parser
  • your Clip parser (ClipHolder) is wrapped around the recorder
  • when the XML is parsed, recorder will record everything and your ClipHolder will only look at what it knows about
  • you then do whatever you need to do with the clip object
  • the serializer is then wrapped around the recorder (basically re-mapping it onto itself)
  • you then work with the serializer and it will take care of feeding the recorded events (delegating to the parent and registering self as a ContentHandler) overlayed with what it has to say about the clip object.

Please find the DVR code and the Clip test over at github. I hope it helps.

p.s. it's not a generic solution and the whole record->replay+overlay concept is very rudimentary in the provided implementation. An illustration basically. If your XML is more complex and gets "hairy" (e.g. same element names on different levels, etc.) then the logic will need to be augmented. The concept will remain the same though.

Mellott answered 23/5, 2012 at 21:25 Comment(0)
S
1

You're right to say that SAX is probably not the best option if you want to keep the nodes that you've not "consumed". You could still do it using some kind of "sax store" that would keep the SAX events and replay them (there are some few implementations of such a thing around), but an object model based API would be much easier to use: you'd easily keep the complete object model and just update "your" nodes.

Of course, you can use DOM which is the standard, but you may also want to consider alternatives which provide an easier access to the specific nodes that you'll be using in an arbitrary data model. Among them, JDOM (http://www.jdom.org/) and XOM (http://www.xom.nu/) are interesting candidates.

Swelter answered 22/5, 2012 at 18:13 Comment(0)
M
1

Here's how you can go about it with SAX filters:

  1. When you read your document with SAX you record all the events. You record them and bubble them up further to the next level of SAX reader. You basically stack together two layers of SAX readers (with XMLFilter) - one will record and relay, and the other one is your current SAX handler that creates objects.
  2. When you're ready to write your modifications back to disk you fire up the recorded SAX events layered with your writer that would overwrite those values/nodes you have altered.

I spent some time with the idea and it worked. It basically came down to proper chaining of XMLFilters. Here's how the unit test looks like, your code would do something similar:

final SAXParserFactory factory = SAXParserFactory.newInstance();
final SAXParser parser = factory.newSAXParser();

final RecorderProxy recorder = new RecorderProxy(parser.getXMLReader());
final ClipHolder clipHolder = new ClipHolder(recorder);

clipHolder.parse(new InputSource(new StringReader(srcXml)));

assertTrue(recorder.hasRecordingToReplay());

final Clip clip = clipHolder.getClip();
assertNotNull(clip);
assertEquals(clip.title, "My Title");
assertEquals(clip.category, "Blah!");
assertEquals(clip.date, Clip.DATE_FORMAT.parse("12/24/2012"));

clip.title = "My Title Updated";
clip.category = "Something else";

final ClipSerializer serializer = new ClipSerializer(recorder);
serializer.setClip(clip);

final TransformerFactory xsltFactory = TransformerFactory.newInstance();
final Transformer t = xsltFactory.newTransformer();
final StringWriter outXmlBuffer = new StringWriter();

t.transform(new SAXSource(serializer, 
            new InputSource()), new StreamResult(outXmlBuffer));

assertEquals(targetXml, outXmlBuffer.getBuffer().toString());

The important lines are:

  • your SAX events recorder is wrapped around the SAX parser
  • your Clip parser (ClipHolder) is wrapped around the recorder
  • when the XML is parsed, recorder will record everything and your ClipHolder will only look at what it knows about
  • you then do whatever you need to do with the clip object
  • the serializer is then wrapped around the recorder (basically re-mapping it onto itself)
  • you then work with the serializer and it will take care of feeding the recorded events (delegating to the parent and registering self as a ContentHandler) overlayed with what it has to say about the clip object.

Please find the DVR code and the Clip test over at github. I hope it helps.

p.s. it's not a generic solution and the whole record->replay+overlay concept is very rudimentary in the provided implementation. An illustration basically. If your XML is more complex and gets "hairy" (e.g. same element names on different levels, etc.) then the logic will need to be augmented. The concept will remain the same though.

Mellott answered 23/5, 2012 at 21:25 Comment(0)
G
0

If you're not bound to a specific xml schema, you should consider doing something like this:

<data>
    <element id="title">
        myTitle
    </element>
    <element id="date">
         18/05/2012
    </element>
    ...
</data>

and then store all those elements in a single ArrayList. In this way you wouldn't lose infos, and you still have the possibility of chosing what element you want to show-edit-etc...

Gurglet answered 18/5, 2012 at 8:1 Comment(1)
this isn't possible as the current structure is already being parsed by the backend and another desktop app. Also, many devices already have the current structure on them. Furthermore the extra nodes that could be in the xml in the future may not be simple text values they may have nodes within them.Varia
P
0

Your assumption on XPath being 20x slower than SAX parsing is flawed... SAX parsing is just a low level tokenizer on which your processing logic would be built... and your processing logic would require additional parsing... XPath's performance has a lot to be with the implementation... As far as I know, vtd-xml's XPath is at least an order of magnitude faster than DOM in general, and is far better suited for heavy duty XML Processing... below are a few links to further references...

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

Android - XPath evaluate very slow

Polyvalent answered 22/4, 2016 at 6:36 Comment(1)
It very well may be the case that I wasn't using xpath efficiently or that I could have used alternative libraries that would have been more efficient. Thanks for the feedback. I'll look into vtd-xml if I work on that area of the project again anytime soon.Varia

© 2022 - 2024 — McMap. All rights reserved.