Handling Empty Tags in XML using Sax Parser, Java
Asked Answered
C

3

7

I'm Using a Sax parser to handle a pre-written XML file....i have no way of changing the XML as it is held by another application but need to parse data from it. The XML file contains a Tag < ERROR_TEXT/> which is empty when no error is occurred. as a result the parser takes the next character after the tag close which is "\n". I have tried result.replaceAll("\n", ""); and result.replaceAll("\n", "");

how do I get SAX to recognize this is an empty tag and return the value as "" ?

Consequence answered 30/5, 2012 at 9:6 Comment(0)
E
2

You DO THAT. If you have xml and Java source blow.

<ERROR_TEXT>easy</ERROR_TEXT><ERROR_TEXT/>

Java code

private boolean isKeySet = false;
private String key = "";
@Override
public void characters(
    char[] ch,
    int start,
    int length
) throws SAXException
{
    if (!isKeySet) {
        return;
    }
    isKeySet = false;
    logger.debug("key : [" + key + "], value : [" + value + "]");
}
@Override
public void startElement(
    String uri,
    String localName,
    String qName,
    Attributes attrs
) throws SAXException
{
    key = qName;
    isKeySet = true;
}

@Override
public void endElement(
    String uri,
    String localName,
    String qName
) throws SAXException
{
    if (isKeySet) {
        isKeySet = false;
        logger.debug("key : [" + key + "](EMPTY!!!)");
    }
}

RESULT log:

key : [ERROR_TEXT], value : [easy]

key : [ERROR_TEXT](EMPTY!!!)

Call flow: startElement() -> characters() -> endElement() -> startElement() -> endElement() -> characters()

That's it! THE END

Exodontics answered 10/11, 2017 at 7:38 Comment(0)
A
1

SAXParser returns cDAta through the characters() event which it calls whenever it encounters 'characters' literally. It's pointless to use that function as it is called after every open tag regardless of whether it actually contains any data. You could use String.trim() and do a String.length()>=0 check before proceeding.

Angellaangelle answered 27/1, 2014 at 7:43 Comment(2)
Thanks, it worked for me. But I still think it should just return an empty string if there's no data.Distrust
@Distrust 2018 me agrees with you :)Angellaangelle
F
0

You don't. It is SAXs job parse the data, not to make decisions on what the content of that data is supposed to be. In your parseHandler, store the string of the data in all your element, and when you go to process that element, do a string.trim() on the data. if the output of that is blank and your tag is an ERROR_TEXT tag, you know there is no error.

Feola answered 30/5, 2012 at 9:40 Comment(5)
'string.trim()' won't delete \n. The string appears as "\n" when i debug it.Consequence
the Sax parser isn't recognising the empty tag rather getting the return character after it.Consequence
It should return a start element, and end element and a number of blanks characters in the middle. Is that not what you are getting? If you want to check for \n characters, do a replace for those and space, then do a trim.Feola
no see the tag is like this <ERROR_TEXT/ > and the sax parser is not treating it as <ERROR_TEXT ></ERROR_TEXT > i want it to give me a null but istead it is giving me the first character after <ERROR_TEXT/ > which happens to be \nConsequence
You cannot change what it gives you. Why is it a problem ignoring a \n? Are you using a default handler or your own? if you are using your own it is easy establish the tag is empty. If not, it shouldn't be hard to ignore if you are looking for a string and you get whitespace. If it is a major problem for you, use a dom parser instead of a saxFeola

© 2022 - 2024 — McMap. All rights reserved.