Effective way of creating a String from char[],start,length in Java
Asked Answered
I

3

9

We are using Java SAX to parser on really big XML files. Our characters implementation looks like following:

@Override
public void characters(char ch[], int start, int length) throws SAXException {
    String value = String.copyValueOf(ch, start, length);
    ...
}

(ch[] arrays passed by SAX tend to be pretty long)

But we are recently getting some performance issues and the profiler shows us that over 20% of our CPU usage is above invocation of String.copyValueOf (which invoked new String(ch,start,length) under the hood).

Is there any more effective way to obtain a String from array of characters, start index and length than String.copyValueOf(ch, start, length) or new String(ch,start,length)?

Inviolable answered 6/6, 2013 at 7:44 Comment(4)
It might be worse, but have you tried a StringBuilder? new String(ch,start,length) just copies the array over but I don't know how fast can a StringBuilder work.Forlini
The built String is not returned. What do you do with it? Could that which is done with that String also be done directly on char[] with start and length?Phocaea
@Phocaea Yeah, I thought about that. But we do many different operations with it where we treat it as Strings. It would be extremely hard (or at least the code would be really ugly) to operate on char arrays.Inviolable
A validating parser on an XML with DTD or schema will strip whitespace on elements without children. By holding external files in a local XML catalog, the speed penalty of external references is mitigated.Bloodred
A
4

Good question, but I'm sure, that answer is no.

This is because any String object construction uses arrays copy method. It can not be constructed directly on exist array, because String object must be immutable and its internal string array representation is encapsulated from outer changes.

Furthermore, in your case you have a deal with a fragment of some array. It is impossible to build String object on the fragment of another array in any way.

Agnes answered 6/6, 2013 at 7:51 Comment(0)
T
2

As stated by @Andremoniy, if you want to use a String object, it always has to be created and contents get copied into it.

The only possibility to speed up your parser is to reduce the number of newly build string-objects to a minimum.

I doupt, that every element in your xml-structure contains raw data between start and end tags.

Therefor I would suggest to only create the strings if you are within an element where the data is of interest. Moreover I would suggest to limit the possible elements somehow. For example by hierarchie-level or the parent element to reduce the number of stringcompaisons. But this depends on the xml-structure.

protected boolean readChars = false;
protected int level = -1;

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    ++level;

    if (level == 4) {
        if (qName.equalsIgnoreCase("TextElement")) {
            readChars = true;
        }
    }
 }

@Override
public void characters(char ch[], int start, int length) throws SAXException {
    if (readChars) {
        String value = String.copyValueOf(ch, start, length);
        ...
        readChars = false;
    }
}

@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
    --level;
}
Tentage answered 6/6, 2013 at 7:56 Comment(0)
B
1

Possibly in conjunction, that characters might be called more than once inside one single tag, holding a StringBuilder on element level might be appropiate. This does a System.arrayCopy.

Bloodred answered 6/6, 2013 at 8:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.