How to trim a java stringbuilder?
Asked Answered
B

9

15

I have a StringBuilder object that needs to be trimmed (i.e. all whitespace chars /u0020 and below removed from either end).

I can't seem to find a method in string builder that would do this.

Here's what I'm doing now:

String trimmedStr = strBuilder.toString().trim();

This gives exactly the desired output, but it requires two Strings to be allocated instead of one. Is there a more efficient to trim the string while it's still in the StringBuilder?

Backfield answered 6/3, 2011 at 19:47 Comment(0)
M
29

You should not use the deleteCharAt approach.

As Boris pointed out, the deleteCharAt method copies the array over every time. The code in the Java 5 that does this looks like this:

public AbstractStringBuilder deleteCharAt(int index) {
    if ((index < 0) || (index >= count))
        throw new StringIndexOutOfBoundsException(index);
    System.arraycopy(value, index+1, value, index, count-index-1);
    count--;
    return this;
}

Of course, speculation alone is not enough to choose one method of optimization over another, so I decided to time the 3 approaches in this thread: the original, the delete approach, and the substring approach.

Here is the code I tested for the orignal:

public static String trimOriginal(StringBuilder sb) {
    return sb.toString().trim();
}

The delete approach:

public static String trimDelete(StringBuilder sb) {
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0))) {
        sb.deleteCharAt(0);
    }
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
        sb.deleteCharAt(sb.length() - 1);
    }
    return sb.toString();
}

And the substring approach:

public static String trimSubstring(StringBuilder sb) {
    int first, last;

    for (first=0; first<sb.length(); first++)
        if (!Character.isWhitespace(sb.charAt(first)))
            break;

    for (last=sb.length(); last>first; last--)
        if (!Character.isWhitespace(sb.charAt(last-1)))
            break;

    return sb.substring(first, last);
}

I performed 100 tests, each time generating a million-character StringBuffer with ten thousand trailing and leading spaces. The testing itself is very basic, but it gives a good idea of how long the methods take.

Here is the code to time the 3 approaches:

public static void main(String[] args) {

    long originalTime = 0;
    long deleteTime = 0;
    long substringTime = 0;

    for (int i=0; i<100; i++) {

        StringBuilder sb1 = new StringBuilder();
        StringBuilder sb2 = new StringBuilder();
        StringBuilder sb3 = new StringBuilder();

        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }
        for (int j=0; j<980000; j++) {
            sb1.append("a");
            sb2.append("a");
            sb3.append("a");
        }
        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }

        long timer1 = System.currentTimeMillis();
        trimOriginal(sb1);
        originalTime += System.currentTimeMillis() - timer1;

        long timer2 = System.currentTimeMillis();
        trimDelete(sb2);
        deleteTime += System.currentTimeMillis() - timer2;

        long timer3 = System.currentTimeMillis();
        trimSubstring(sb3);
        substringTime += System.currentTimeMillis() - timer3;
    }

    System.out.println("original:  " + originalTime + " ms");
    System.out.println("delete:    " + deleteTime + " ms");
    System.out.println("substring: " + substringTime + " ms");
}

I got the following output:

original:  176 ms
delete:    179242 ms
substring: 154 ms

As we see, the substring approach provides a very slight optimization over the original "two String" approach. However, the delete approach is extremely slow and should be avoided.

So to answer your question: you are fine trimming your StringBuilder the way you suggested in the question. The very slight optimization that the substring method offers probably does not justify the excess code.

Maryammaryann answered 6/3, 2011 at 21:43 Comment(7)
Nice elaboration of my comment, I didn't had the time myself. Be careful however with those micro-benchmarks in Java. The JVM optimizes the running code in so many ways that simple micro-benchmarks can be misleading (e.g. see Anatomy of a flawed microbenchmark or this question for details). There are tools like caliper which help a bit doing them.Circumlocution
In the linked question there is also an answer that links to [some articles]( ibm.com/developerworks/java/library/j-benchmark1.html) and a tool I've used before (but didn't found for my first comment). Also a good read.Circumlocution
Hey Boris, thanks a lot for the info. I'm reading the article now, and I find it really useful so far.Maryammaryann
There's a slight speed benefit to substring, but I think the biggest improvement is in memory, since the string doesn't have to be copied twice like the original method. +1Backfield
Beware of the pitfalls in microbenchmarking. What happens if you change the order of the tests so that the substring version is tested first? I'd bet you get the opposite result. (Here, we have extensive unit tests for the methods our application uses, and whichever runs first runs slower than the others. Probably because of Hotspot kicking in.)Betray
I prefer the delete approach. For small strings the time difference will be small plus most important it does not create garbage.Hexylresorcinol
How about using the delete(int, int) method for leading spaces. Just compute the start position. And then, use the setLength method.Murtagh
T
3

I've used Zaven's analysis approach and StringBuilder's delete(start, end) method which performs far better than the deleteCharAt(index) approach, but slightly worse than the substring() approach. This method also uses the array copy, but array copy is called far fewer times (only twice in the worst case). In addition, this avoids creating multiple instances of intermediate Strings in case trim() is called repeatedly on the same StringBuilder object.

public class Main {

    public static String trimOriginal(StringBuilder sb) {
        return sb.toString().trim();
    }

    public static String trimDeleteRange(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        if (first == last) {
            sb.delete(0, sb.length());
        } else {
           if (last < sb.length()) {
              sb.delete(last, sb.length());
           }
           if (first > 0) {
              sb.delete(0, first);
           }
        }
        return sb.toString();
    }


    public static String trimSubstring(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        return sb.substring(first, last);
    }

    public static void main(String[] args) {
        runAnalysis(1000);
        runAnalysis(10000);
        runAnalysis(100000);
        runAnalysis(200000);
        runAnalysis(500000);
        runAnalysis(1000000);
    }

    private static void runAnalysis(int stringLength) {
        System.out.println("Main:runAnalysis(string-length=" + stringLength + ")");

        long originalTime = 0;
        long deleteTime = 0;
        long substringTime = 0;

        for (int i = 0; i < 200; i++) {

            StringBuilder temp = new StringBuilder();
            char[] options = {' ', ' ', ' ', ' ', 'a', 'b', 'c', 'd'};
            for (int j = 0; j < stringLength; j++) {
                temp.append(options[(int) ((Math.random() * 1000)) % options.length]);
            }
            String testStr = temp.toString();

            StringBuilder sb1 = new StringBuilder(testStr);
            StringBuilder sb2 = new StringBuilder(testStr);
            StringBuilder sb3 = new StringBuilder(testStr);

            long timer1 = System.currentTimeMillis();
            trimOriginal(sb1);
            originalTime += System.currentTimeMillis() - timer1;

            long timer2 = System.currentTimeMillis();
            trimDeleteRange(sb2);
            deleteTime += System.currentTimeMillis() - timer2;

            long timer3 = System.currentTimeMillis();
            trimSubstring(sb3);
            substringTime += System.currentTimeMillis() - timer3;
        }

        System.out.println("  original:     " + originalTime + " ms");
        System.out.println("  delete-range: " + deleteTime + " ms");
        System.out.println("  substring:    " + substringTime + " ms");
    }

}

Output:

Main:runAnalysis(string-length=1000)
  original:     0 ms
  delete-range: 4 ms
  substring:    0 ms
Main:runAnalysis(string-length=10000)
  original:     4 ms
  delete-range: 9 ms
  substring:    4 ms
Main:runAnalysis(string-length=100000)
  original:     22 ms
  delete-range: 33 ms
  substring:    43 ms
Main:runAnalysis(string-length=200000)
  original:     57 ms
  delete-range: 93 ms
  substring:    110 ms
Main:runAnalysis(string-length=500000)
  original:     266 ms
  delete-range: 220 ms
  substring:    191 ms
Main:runAnalysis(string-length=1000000)
  original:     479 ms
  delete-range: 467 ms
  substring:    426 ms
Trulatrull answered 6/3, 2011 at 23:19 Comment(3)
It would be interesting to see this benchmarked over iterations rather than string length. I wonder how much of a difference there would be.Backfield
It should be easy to modify the code to add support for iterations too. I think the results would remain the same with substring performing best, but the gain from the delete method would come from saving over fewer objects to gc which is not being measured in the current benchmark code.Trulatrull
@Trulatrull - why did you say this - "StringBuilder's delete(start, end) method which performs far better than the deleteCharAt(index) approach" ?Weatherproof
S
2

Don't worry about having two strings. It's a microoptimization.

If you really have detected a bottleneck, you can have a nearly-constant-time trimming - just iterate the first N chars, until they are Character.isWhitespace(c)

Sabbatical answered 6/3, 2011 at 19:52 Comment(3)
The reason I'm worrying about it is because I'm on Android and I'm parsing an XML file, so this operation will be repeated for thousands of tags. This means that I'll be allocating about 6000 Strings instead of 3000, which is a really nice improvement.Backfield
More important than this optimisation is not to use StringBuilder default constructor, give it a sufficient capacity to start with. In the current implementation default capacity is 16 characters, and if that's not enough, the buffer is replaced by a larger one. Especially for long texts, this will gain you far more than anything else (because the buffer would get reallocated several times for each line of input).Betray
@Betray good point. I already use a 500 char capacity in the StringBuilder ctor which is appropriate for my applicationBackfield
F
1

only one of you have taken into account that when you convert the String builder to a "string" and then "trim" that you create an immutable object twice that has to be garbage collected, so the total allocation is:

  1. Stringbuilder object
  2. immutable string of the SB object 1 immutable object of the string that has been trimmed.

So whilst it may "appear" that the trim is faster, in the real world and with a loaded memory scheme it will in fact be worse.

Forbore answered 31/5, 2012 at 13:11 Comment(0)
B
1

I had exactly your question at first, however, after 5-minute's second thought, I realized actually you never need to trim the StringBuffer! You only need to trim the string you append into the StringBuffer.

If you want to trim an initial StringBuffer, you can do this:

StringBuffer sb = new StringBuffer(initialStr.trim());

If you want to trim StringBuffer on-the-fly, you can do this during append:

Sb.append(addOnStr.trim());
Biased answered 3/11, 2017 at 3:19 Comment(0)
B
0

You get two strings, but I'd expect the data to be only allocated once. Since Strings in Java are immutable, I'd expect the trim implementation to give you an object that shares the same character data, but with different start- and end indices. At least that's what the substr method does. So, anything you try to optimise this most certainly will have the opposite effect, since you add overhead that is not needed.

Just step through the trim() method with your debugger.

Betray answered 6/3, 2011 at 21:29 Comment(2)
It returns a new string object which copies the characters via arraycopy: from source code: return new String(start, end - start + 1, value);Backfield
Where do you see arraycopy? This is the String constructor used by trim in JDK 1.6.0, the buffer is reused without copying: String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; }Betray
W
0

I made some code. It works and the test cases are there for you to see. Let me know if this is okay.

Main code -

public static StringBuilder trimStringBuilderSpaces(StringBuilder sb) {

    int len = sb.length();

    if (len > 0) {

            int start = 0;
            int end = 1;
            char space = ' ';
            int i = 0;

            // Remove spaces at start
            for (i = 0; i < len; i++) {
                if (sb.charAt(i) != space) {
                    break;
                }
            }

            end = i;
            //System.out.println("s = " + start + ", e = " + end);
            sb.delete(start, end);

            // Remove the ending spaces
            len = sb.length();

            if (len > 1) {

                for (i = len - 1; i > 0; i--) {
                    if (sb.charAt(i) != space) {
                        i = i + 1;
                        break;
                    }
                }

                start = i;
                end = len;// or len + any positive number !

                //System.out.println("s = " + start + ", e = " + end);
                sb.delete(start, end);

            }

    }

    return sb;
}

The full code with test -

package source;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.ArrayList;

public class StringBuilderTrim {

    public static void main(String[] args) {
        testCode();
    }

    public static void testCode() {

        StringBuilder s1 = new StringBuilder("");
        StringBuilder s2 = new StringBuilder(" ");
        StringBuilder s3 = new StringBuilder("  ");
        StringBuilder s4 = new StringBuilder(" 123");
        StringBuilder s5 = new StringBuilder("  123");
        StringBuilder s6 = new StringBuilder("1");
        StringBuilder s7 = new StringBuilder("123 ");
        StringBuilder s8 = new StringBuilder("123  ");
        StringBuilder s9 = new StringBuilder(" 123 ");
        StringBuilder s10 = new StringBuilder("  123  ");

        /*
         * Using a rough form of TDD here. Initially, one one test input
         * "test case" was added and rest were commented. Write no code for the
         * method being tested. So, the test will fail. Write just enough code
         * to make it pass. Then, enable the next test. Repeat !!!
         */
        ArrayList<StringBuilder> ins = new ArrayList<StringBuilder>();
        ins.add(s1);
        ins.add(s2);
        ins.add(s3);
        ins.add(s4);
        ins.add(s5);
        ins.add(s6);
        ins.add(s7);
        ins.add(s8);
        ins.add(s9);
        ins.add(s10);

        // Run test
        for (StringBuilder sb : ins) {
            System.out
                    .println("\n\n---------------------------------------------");
            String expected = sb.toString().trim();
            String result = trimStringBuilderSpaces(sb).toString();
            System.out.println("In [" + sb + "]" + ", Expected [" + expected
                    + "]" + ", Out [" + result + "]");
            if (result.equals(expected)) {
                System.out.println("Success!");
            } else {
                System.out.println("FAILED!");
            }
            System.out.println("---------------------------------------------");
        }

    }

    public static StringBuilder trimStringBuilderSpaces(StringBuilder inputSb) {

        StringBuilder sb = new StringBuilder(inputSb);
        int len = sb.length();

        if (len > 0) {

            try {

                int start = 0;
                int end = 1;
                char space = ' ';
                int i = 0;

                // Remove spaces at start
                for (i = 0; i < len; i++) {
                    if (sb.charAt(i) != space) {
                        break;
                    }
                }

                end = i;
                //System.out.println("s = " + start + ", e = " + end);
                sb.delete(start, end);

                // Remove the ending spaces
                len = sb.length();

                if (len > 1) {

                    for (i = len - 1; i > 0; i--) {
                        if (sb.charAt(i) != space) {
                            i = i + 1;
                            break;
                        }
                    }

                    start = i;
                    end = len;// or len + any positive number !

                    //System.out.println("s = " + start + ", e = " + end);
                    sb.delete(start, end);

                }

            } catch (Exception ex) {

                StringWriter sw = new StringWriter();
                PrintWriter pw = new PrintWriter(sw);
                ex.printStackTrace(pw);
                sw.toString(); // stack trace as a string

                sb = new StringBuilder("\nNo Out due to error:\n" + "\n" + sw);
                return sb;
            }

        }

        return sb;
    }
}
Weatherproof answered 11/12, 2014 at 8:2 Comment(0)
I
0
strBuilder.replace(0,strBuilder.length(),strBuilder.toString().trim());
Iconolatry answered 17/11, 2019 at 22:46 Comment(1)
While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations!Kelwunn
E
0

Since deleteCharAt() copies array each time, I have come with below code which copies array two times in worst case when StringBuilder have both leading and trailing whitespace. Below code will make sure that object reference remains same and we are not creating new object.

    public static void trimStringBuilder(StringBuilder builder) {
        int len = builder.length();
        int start = 0;
        // Remove whitespace from start
        while (start < len && builder.charAt(start) == ' ') {
            start++;
        }
        if (start > 0) {
            builder.delete(0, start);
        }
    
        len = builder.length();
        int end = len;
        // Remove whitespace from end
        while (end > 0 && builder.charAt(end - 1) == ' ') {
            end--;
        }
        if (end < len) {
            builder.delete(end, len);
        }
    }
Epicritic answered 20/7, 2021 at 14:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.