How to modify a huge XML file by StAX?
Asked Answered
A

3

8

I have a huge XML (~2GB) and I need to add new Elements and modify the old ones. For example, I have:

<books>
    <book>....</book>
    ...
    <book>....</book>
</books>

And want to get:

<books>
   <book>
      <index></index>
      ....
   </book>
   ...
   <book>
      <index></index>
      ....
   </book>
</books>

I used the following code:

XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(file));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter(file, true));
while (eventReader.hasNext()) {
   XMLEvent event = eventReader.nextEvent();
   if (event.getEventType() == XMLEvent.START_ELEMENT) {
      if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
          writer.writeStartElement("index");
          writer.writeEndElement();
       }
    }
}
writer.close();

But the result was the following:

<books>
   <book>....</book>
   ....
   <book>....</book>
</books><index></index>

Any ideas?

Acidophil answered 10/5, 2013 at 9:49 Comment(0)
B
20

Try this

    XMLInputFactory inFactory = XMLInputFactory.newInstance();
    XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream("1.xml"));
    XMLOutputFactory factory = XMLOutputFactory.newInstance();
    XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(file));
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();
    while (eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        writer.add(event);
        if (event.getEventType() == XMLEvent.START_ELEMENT) {
            if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
                writer.add(eventFactory.createStartElement("", null, "index"));
                writer.add(eventFactory.createEndElement("", null, "index"));
            }
        }
    }
    writer.close();

Notes

new FileWriter(file, true) is appending to the end of the file, you hardly really need it

equalsIgnoreCase("book") is bad idea because XML is case-sensitive

Bula answered 10/5, 2013 at 10:21 Comment(8)
Unfortunately, this code doesn't work. NetBeans gives me an error: 'Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,2] Message: XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at librarian.controllers.BookCardController.saveToXML(BookCardController.java:140) ... 54 more' And furthermore it deletes all contents of the file...Acidophil
what's the exception? I tested it with your xml before postingBula
Have just tried it. And again the same exception: 'Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,5] Message: XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at librarian.controllers.BookCardController.saveToXML(BookCardController.java:138) ... 54 more' I'm really don't know why, but additionally this code clears my file.Acidophil
Well, it seems this exception is because I used the same file as input and output. After choosing the different destination file the code started working, but... the output was the following: '<index></index><index></index><index></index><index></index><index></index><index></index><index></index>'. And I need to include Element into the existing XML.Acidophil
if you leave only this XMLEvent event = eventReader.nextEvent(); writer.add(event); in the loop you are supposed to get output == input, it cannot be lost, try to debugBula
Oh, it was my mistake. I accidently delete this line. Yes, it works greatly. Thank you a lot. But now I have one more question. I tried to do this with 40MB file and it took me 2,5 - 3 seconds, and if I use 2GB file, it will take me nearly 3 minutes! Is there any possibility to accelerate this code?Acidophil
Not sure it will help but worth trying: change FileWriter to new BufferedOutputStream(new FileInputStream(file)) and FileInputStream with new BufferedInputStream(new FileInputStream(file))Bula
I've already tried it, but it gives only 50-100 milliseconds economy. Ok, thank you very much. You really helped me!Acidophil
H
5

Well it is pretty clear why it behaves the way it does. What you are actually doing is opening the existing file in output append mode and writing elements at the end. That clearly contradicts what you are trying to do.

(Aside: I'm surprised that it works as well as it does given that the input side is likely to see the elements that the output side is added to the end of the file. And indeed the exceptions like Evgeniy Dorofeev's example gives are the sort of thing I'd expect. The problem is that if you attempt to read and write a text file at the same time, and either the reader or writer uses any form of buffering, explicit or implicit, the reader is liable to see partial states.)

To fix this you have to start by reading from one file and writing to a different file. Appending won't work. Then you have to arrange that the elements, attributes, content etc that are read from the input file are copied to the output file. Finally, you need to add the extra elements at the appropriate points.


And is there any possibility to open the XML file in mode like RandomAccessFile, but write in it by StAX methods?

No. That is theoretically impossible. In order to to be able to navigate around an XML file's structure in a "random" file, you'd first need to parse the whole thing and build an index of where all the elements are. Even when you've done that, the XML is still stored as characters in a file, and random access does not allow you to insert and remove characters in the middle of a file.

Maybe your best bet would be combining XSL and a SAX style parser; e.g. something along the lines of this IBM article: http://ibm.com/developerworks/xml/library/x-tiptrax

Heartsick answered 10/5, 2013 at 10:21 Comment(0)
I
0

Maybe this StAX Read-and-Write Example in JavaEE tutorial helps: http://docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq

You can download the tutorial examples here: https://java.net/projects/javaeetutorial/downloads

Iover answered 7/6, 2013 at 19:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.