StAX XML all content between two required tags
Asked Answered
C

3

9

Starting learning the StAX, using XMLStreamReader, I faced with some problem. How can I get ALL content between tags as Text? I mean, I know name of needed tag, and when I find it, I must go to the close tag, and everything I found between them I must append at some string. For example, we have something like

<rootTag>
...    
    <someTag>
        Some text content and other tags here…
    </someTag >
    <tagINeed>
        <someinternalTag1>
            <someinternalTag11>
                Some text content..
            </someinternalTag11>
            ...
        </someinternalTag1>
        <someinternalTag2>
            Something here
        </someinternalTag2>
    </tagINeed>
...
    <somethingAnother>
...
    </somethingAnother >
...
</rootTag>    

So, I need to get my string as

        <someinternalTag1>
            <someinternalTag11>
                Some text content..
            </someinternalTag11>
            ...
        </someinternalTag1>
        <someinternalTag2>
            Something here
        </someinternalTag2>

How can I get it? Maybe, I must find start and end offsets of needed block in source xml, and give substring after parsing?

Caye answered 27/12, 2012 at 8:56 Comment(0)
F
11

Try

    StringWriter sw = new StringWriter();
    XMLOutputFactory of = XMLOutputFactory.newInstance(); 
    XMLEventWriter xw = null;
    XMLInputFactory f = XMLInputFactory.newInstance();
    XMLEventReader xr = f.createXMLEventReader(new FileInputStream("test.xml"));
    while (xr.hasNext()) {
        XMLEvent e = xr.nextEvent();
        if (e.isStartElement()
                && ((StartElement) e).getName().getLocalPart().equals("tagINeed")) {
            xw = of.createXMLEventWriter(sw);
        } else if (e.isEndElement()
                && ((EndElement) e).getName().getLocalPart().equals("tagINeed")) {
            break;
        } else if (xw != null) {
            xw.add(e);
        }
    }
    xw.close();
    System.out.println(sw);

prints

    <someinternalTag1>
        <someinternalTag11>
            Some text content..
        </someinternalTag11>
    </someinternalTag1>
    <someinternalTag2>
        Something here
    </someinternalTag2>

Update:

If you need XML string with too, we can write like that:

        if (e.isStartElement() &&
                ((StartElement) e).getName().getLocalPart().equals("tagINeed")){
            xw = of.createXMLEventWriter(sw);
            xw.add(e);
        } else if (e.isEndElement() &&
                ((EndElement) e).getName().getLocalPart().equals("tagINeed")){
            xw.add(e);
            break;
        } else if (xw != null) {
            xw.add(e);
        }
Frida answered 27/12, 2012 at 9:25 Comment(8)
But it outputs [Stax Event #4][Stax Event #1][Stax Event #4][Stax Event #1][Stax Event #4][Stax Event #2][Stax Event #4][Stax Event #2][Stax Event #4][Stax Event #1][Stax Event #4][Stax Event #2][Stax Event #4]Caye
Well, that output is real.. My StAX is Java 7 internal com.sun.xml.internal.stream.XMLInputFactoryImpl. What's your StAX?Frida
Anyway, try my updated version, it does not depend on StAX implFrida
e- is just type of event.. My problem is how to get content in current position as text, without checking type. For not doing something like if(e == XMLStreamConstants.START_ELEMENT){ System.out.println("<" + reader.getLocalName() + ">"); } else if(e == XMLStreamConstants.END_ELEMENT){ System.out.println("</" + reader.getLocalName() + ">"); } else if(e == XMLStreamConstants.CHARACTERS){ System.out.println(reader.getText()); }Caye
sorry, don't understand.. i must import XMLInputFactory from com.sun.xml.internal.stream? But there is no such class in this package... Can I ask you to publish full program text here, with "import" part?Caye
Just try my latest version, it is fixed not to depend on StAX implFrida
As for StAX impl: when you call javax.xml.stream.XMLInputFactory.newInstance() XMLInputFactory searches for the real implementation and if there is no providers on the class path it takes the default one from rt.jar.Frida
thanks + 1. but it fails if there is an inner tag with the same name. I added a counter. See below.Bani
B
1

Solution of E. Dorofeev is good, but it fails if there is an inner tag with the same name. I added a counter.

String fichier="test_stax_2.txt";

String tag="tagINeed";
int count=0;

StringWriter sw = new StringWriter();
XMLOutputFactory of = XMLOutputFactory.newInstance(); 
XMLEventWriter xw = null;
XMLInputFactory f = XMLInputFactory.newInstance();
XMLEventReader xr = f.createXMLEventReader(new FileInputStream(fichier));

while (xr.hasNext())
    {
    XMLEvent e = xr.nextEvent();
    if (e.isStartElement()
            && ((StartElement) e).getName().getLocalPart().equals(tag))
        {
        if (count==0)
            xw = of.createXMLEventWriter(sw);
        else
            xw.add(e);
        count++;
        } 

    else if (e.isEndElement()
            && ((EndElement) e).getName().getLocalPart().equals(tag))
        {
        count --;
        if (count==0)
            break;
        else
            xw.add(e);
        } 
        else if (xw != null) 
        {
        xw.add(e);
        }
}
if (xw!=null)
   xw.close();

System.out.println(sw);
Bani answered 6/8, 2017 at 20:11 Comment(0)
G
0

In XML everything is a node and STAX enables you to traverse through these nodes one by one. I think your desired result can be obtained by converting the XML into a string and then searching for the required String using Transformer.

Transformer t=TransformerFactory.newInstance().newTransformer();
StringWriter sw=new StringWriter();         
StreamResult result=new StreamResult(sw);//holds the result of a transformation
DOMSource d=new DOMSource(XMLdoc);//your XML document
t.transform(d, result);
String xmlstring=sw.toString();

you may use xmlstring to get the desired result.

Glister answered 27/12, 2012 at 9:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.