What's the fastest way to concatenate two Strings in Java?
Asked Answered
L

17

47

What's the fastest way to concatenate two Strings in Java?

i.e

String ccyPair = ccy1 + ccy2;

I'm using cyPair as a key in a HashMap and it's called in a very tight loop to retrieve values.

When I profile then this is the bottleneck

java.lang.StringBuilder.append(StringBuilder.java:119)  
java.lang.StringBuilder.(StringBuilder.java:93)
Lightsome answered 22/2, 2011 at 10:8 Comment(3)
bottleneck in string concatenation? That would mean all java programs are having preformance issues. Don't microoptimize.Buie
But he has profiled the code, and this is the bottleneck. This isn't micro-optimization, nor premature optimization, it's just optimization.Fontes
@Duncan, actually that's one of the issues. The real issues the ccy code generation into the loop. It contains multiple allocations+memory barriers, +not so fast hash code (14 mul+add; assuming ccy pairs are like "eur/usdusd/jpy"), and then equals. Using a holding pair w/ references to the 2 strings will be a lot better solution.Holocene
M
23

The reason why these routines show up in the benchmark is because that is how the compiler implements your "+" under the covers.

If you really need the concatenated string, you should let the compiler do its magic with the "+". If all you need is a key for map lookup, a key class holding both strings with suitable equals and hashMap implementations might be a good idea as it avoids the copying step.

2023 edit: With Java 16 came records where the compiler does the hard work automatically under the covers. Just create a record with two string fields and you're done.

Monophonic answered 22/2, 2011 at 10:13 Comment(2)
do you have any sample code to prevent the bottleneck since you might know the implementation partCurtsey
@Deepak, I don't believe this to be a bottleneck, but the easiest way to create such a class in Eclipse 3.6 is to create a new class, give it the fields ccy1, and ccy2, ask Eclipse to create an constructor based on fields, and to generate the hashCode() and equals() methods.Administrate
F
54

Lots of theory - time for some practice!

private final String s1 = new String("1234567890");
private final String s2 = new String("1234567890");

Using plain for loops of 10,000,000, on a warmed-up 64-bit Hotspot, 1.6.0_22 on Intel Mac OS.

eg

@Test public void testConcatenation() {
    for (int i = 0; i < COUNT; i++) {
        String s3 = s1 + s2;
    }
}

With the following statements in the loops

String s3 = s1 + s2; 

1.33s

String s3 = new StringBuilder(s1).append(s2).toString();

1.28s

String s3 = new StringBuffer(s1).append(s2).toString();

1.92s

String s3 = s1.concat(s2);

0.70s

String s3 = "1234567890" + "1234567890";

0.0s

So concat is the clear winner, unless you have static strings, in which case the compiler will have taken care of you already.

Fontes answered 4/3, 2011 at 0:2 Comment(4)
the code will be dead optimized, so you are effectively testing not-optimized code. This is how you dont write micro-benchmarks. Nonetheless, String.contact should be the fastest for 2 strings.Holocene
I am guilty of not further examining the results as they were exactly what I expected! But I don't understand how I am testing not optimised code. If Hotspot were removing code with no side effects all these loops would take the same time, and if it isn't, then I am testing the time to run the statements (plus the loop). The thing that we don't know is the time taken by the loops, but not having too much time on my hands I didn't account for that ;-)Fontes
@DuncanMcGregor It takes a while before the JVM optimizes the code.Administrate
StringBuilder is a way to fast with big strings ,but is slow with small.Copyreader
G
25

I believe the answer might have already been determined, but I post to share the code.

Short answer, if pure concatenation is all you are looking for, is: String.concat(...)

Output:

ITERATION_LIMIT1: 1
ITERATION_LIMIT2: 10000000
s1: STRING1-1111111111111111111111
s2: STRING2-2222222222222222222222

iteration: 1
                                          null:    1.7 nanos
                                 s1.concat(s2):  106.1 nanos
                                       s1 + s2:  251.7 nanos
   new StringBuilder(s1).append(s2).toString():  246.6 nanos
    new StringBuffer(s1).append(s2).toString():  404.7 nanos
                 String.format("%s%s", s1, s2): 3276.0 nanos

Tests complete

Sample Code:

package net.fosdal.scratch;

public class StringConcatenationPerformance {
    private static final int    ITERATION_LIMIT1    = 1;
    private static final int    ITERATION_LIMIT2    = 10000000;

    public static void main(String[] args) {
        String s1 = "STRING1-1111111111111111111111";
        String s2 = "STRING2-2222222222222222222222";
        String methodName;
        long startNanos, durationNanos;
        int iteration2;

        System.out.println("ITERATION_LIMIT1: " + ITERATION_LIMIT1);
        System.out.println("ITERATION_LIMIT2: " + ITERATION_LIMIT2);
        System.out.println("s1: " + s1);
        System.out.println("s2: " + s2);
        int iteration1 = 0;
        while (iteration1++ < ITERATION_LIMIT1) {
            System.out.println();
            System.out.println("iteration: " + iteration1);

            // method #0
            methodName = "null";
            iteration2 = 0;
            startNanos = System.nanoTime();
            while (iteration2++ < ITERATION_LIMIT2) {
                method0(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

            // method #1
            methodName = "s1.concat(s2)";
            iteration2 = 0;
            startNanos = System.nanoTime();
            while (iteration2++ < ITERATION_LIMIT2) {
                method1(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

            // method #2
            iteration2 = 0;
            startNanos = System.nanoTime();
            methodName = "s1 + s2";
            while (iteration2++ < ITERATION_LIMIT2) {
                method2(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

            // method #3
            iteration2 = 0;
            startNanos = System.nanoTime();
            methodName = "new StringBuilder(s1).append(s2).toString()";
            while (iteration2++ < ITERATION_LIMIT2) {
                method3(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

            // method #4
            iteration2 = 0;
            startNanos = System.nanoTime();
            methodName = "new StringBuffer(s1).append(s2).toString()";
            while (iteration2++ < ITERATION_LIMIT2) {
                method4(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

            // method #5
            iteration2 = 0;
            startNanos = System.nanoTime();
            methodName = "String.format(\"%s%s\", s1, s2)";
            while (iteration2++ < ITERATION_LIMIT2) {
                method5(s1, s2);
            }
            durationNanos = System.nanoTime() - startNanos;
            System.out.println(String.format("%50s: %6.1f nanos", methodName, ((double) durationNanos) / ITERATION_LIMIT2));

        }
        System.out.println();
        System.out.println("Tests complete");

    }

    public static String method0(String s1, String s2) {
        return "";
    }

    public static String method1(String s1, String s2) {
        return s1.concat(s2);
    }

    public static String method2(String s1, String s2) {
        return s1 + s2;
    }

    public static String method3(String s1, String s2) {
        return new StringBuilder(s1).append(s2).toString();
    }

    public static String method4(String s1, String s2) {
        return new StringBuffer(s1).append(s2).toString();
    }

    public static String method5(String s1, String s2) {
        return String.format("%s%s", s1, s2);
    }

}
Goldston answered 17/2, 2012 at 14:56 Comment(1)
Nice comment. I've been looking for speed of string.format and now I see that it's little bit slow :-) I'll use concat instead.Rolandorolandson
M
23

The reason why these routines show up in the benchmark is because that is how the compiler implements your "+" under the covers.

If you really need the concatenated string, you should let the compiler do its magic with the "+". If all you need is a key for map lookup, a key class holding both strings with suitable equals and hashMap implementations might be a good idea as it avoids the copying step.

2023 edit: With Java 16 came records where the compiler does the hard work automatically under the covers. Just create a record with two string fields and you're done.

Monophonic answered 22/2, 2011 at 10:13 Comment(2)
do you have any sample code to prevent the bottleneck since you might know the implementation partCurtsey
@Deepak, I don't believe this to be a bottleneck, but the easiest way to create such a class in Eclipse 3.6 is to create a new class, give it the fields ccy1, and ccy2, ask Eclipse to create an constructor based on fields, and to generate the hashCode() and equals() methods.Administrate
M
7

You should test with a String generated at run time (like UUID.randomUUID().toString()) not at compile time (like "my string"). My results are

plus:     118 ns
concat:    52 ns
builder1: 102 ns
builder2:  66 ns
buffer1:  119 ns
buffer2:   87 ns

with this implementation:

private static long COUNT = 10000000;

public static void main(String[] args) throws Exception {
    String s1 = UUID.randomUUID().toString();
    String s2 = UUID.randomUUID().toString();
    for(String methodName : new String[] {
            "none", "plus", "concat", "builder1", "builder2", "buffer1", "buffer2"
    }) {
        Method method = ConcatPerformanceTest.class.getMethod(methodName, String.class, String.class);
        long time = System.nanoTime();
        for(int i = 0; i < COUNT; i++) {
            method.invoke((Object) null, s1, s2);
        }
        System.out.println(methodName + ": " + (System.nanoTime() - time)/COUNT + " ns");
    }
}

public static String none(String s1, String s2) {
    return null;
}

public static String plus(String s1, String s2) {
    return s1 + s2;
}

public static String concat(String s1, String s2) {
    return s1.concat(s2);
}

public static String builder1(String s1, String s2) {
    return new StringBuilder(s1).append(s2).toString();
}

public static String builder2(String s1, String s2) {
    return new StringBuilder(s1.length() + s2.length()).append(s1).append(s2).toString();
}

public static String buffer1(String s1, String s2) {
    return new StringBuffer(s1).append(s2).toString();
}

public static String buffer2(String s1, String s2) {
    return new StringBuffer(s1.length() + s2.length()).append(s1).append(s2).toString();
}
Medicament answered 16/9, 2014 at 9:0 Comment(0)
T
5

For the question in the title: String.concat will typically be the fastest way to concat two Strings (but do note nulls). No [oversized] intermediate buffer or other object is involved. Strangely + gets compiled into relatively inefficient code involving StringBuilder.

However, the body of you question points to other problems. String concatenation to generate keys for a map is a common "anti-idiom". It is a hack and error-prone. Are you sure that the generated key is unique? Will it remain unique after your code is maintained for some as yet unknown requirement? The best approach is to create an immutable value class for the key. Using a List and generic tuple class is a sloppy hack.

Trocki answered 22/2, 2011 at 10:48 Comment(1)
Is the StringBuilder variant really much more inefficient than concat?Leverage
S
3

For me concat3 method as below is the fastest way after doing benchmark on my windows and remote linux machine:- Though i believe concat1 performance is JVM implementation and optimization dependent and may perform better in future version

    public class StringConcat {

    public static void main(String[] args) {
        int run = 100 * 100 * 1000;
        long startTime, total = 0;

        final String a = "a";
        final String b = "assdfsaf";
        final String c = "aasfasfsaf";
        final String d = "afafafdaa";
        final String e = "afdassadf";

        startTime = System.currentTimeMillis();
        concat1(run, a, b, c, d, e);
        total = System.currentTimeMillis() - startTime;
        System.out.println(total);

        startTime = System.currentTimeMillis();
        concat2(run, a, b, c, d, e);
        total = System.currentTimeMillis() - startTime;
        System.out.println(total);

        startTime = System.currentTimeMillis();
        concat3(run, a, b, c, d, e);
        total = System.currentTimeMillis() - startTime;
        System.out.println(total);
    }

    private static void concat3(int run, String a, String b, String c, String d, String e) {
        for (int i = 0; i < run; i++) {
            String str = new StringBuilder(a.length() + b.length() + c.length() + d.length() + e.length()).append(a)
                    .append(b).append(c).append(d).append(e).toString();
        }
    }

    private static void concat2(int run, String a, String b, String c, String d, String e) {
        for (int i = 0; i < run; i++) {
            String str = new StringBuilder(a).append(b).append(c).append(d).append(e).toString();
        }
    }

    private static void concat1(int run, String a, String b, String c, String d, String e) {
        for (int i = 0; i < run; i++) {
            String str = a + b + c + d + e;
        }
    }
}
Snort answered 27/11, 2012 at 16:58 Comment(2)
Can you provide details on the JVM you tested these with?Populace
@Populace java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b05) Java HotSpot(TM) Client VM (build 20.6-b01, mixed mode, sharing)Snort
B
1

Perhaps instead of concatenation, you should create a Pair class?

public class Pair<T1, T2> {
    private T1 first;
    private T2 second;

    public static <U1,U2> Pair<U1,U2> create(U1 first, U2 second) {
        return new Pair<U1,U2>(U1,U2);
    }

    public Pair( ) {}

    public Pair( T1 first, T2 second ) {
        this.first = first;
        this.second = second;
    }

    public T1 getFirst( ) {
        return first;
    }

    public void setFirst( T1 first ) {
        this.first = first;
    }

    public T2 getSecond( ) {
        return second;
    }

    public void setSecond( T2 second ) {
        this.second = second;
    }

    @Override
    public String toString( ) {
        return "Pair [first=" + first + ", second=" + second + "]";
    }

    @Override
    public int hashCode( ) {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((first == null)?0:first.hashCode());
        result = prime * result + ((second == null)?0:second.hashCode());
        return result;
    }

    @Override
    public boolean equals( Object obj ) {
        if ( this == obj )
            return true;
        if ( obj == null )
            return false;
        if ( getClass() != obj.getClass() )
            return false;
        Pair<?, ?> other = (Pair<?, ?>) obj;
        if ( first == null ) {
            if ( other.first != null )
                return false;
        }
        else if ( !first.equals(other.first) )
            return false;
        if ( second == null ) {
            if ( other.second != null )
                return false;
        }
        else if ( !second.equals(other.second) )
            return false;
        return true;
    }

}

And use this as your key in your HashMap

Instead of HashMap<String,Whatever> use HashMap<Pair<String,String>,Whatever>

In your tight loop instead of map.get( str1 + str2 ) you'd use map.get( Pair.create(str1,str2) ).

Byzantine answered 22/2, 2011 at 10:15 Comment(6)
@KitsuneYMG,Can you post a complete working example so that it is handy for tackling such issues in futures.Curtsey
@Curtsey see edits. It you need a triple, quad, etc, it's very easy to use this as a base to add more.Byzantine
@KitsuneYMG,can you post the public static void main method for you pair class so that it can be handy for further referenceCurtsey
I'd be interested to know if this is actually quicker in use, as it doesn't cache the Pair's hashCode, whereas the concatenated string's hashCode is cached.Fontes
@Duncan you could easily cache the hashcode and discard it upon set*. This should be faster than concatenating two strings which requires two memcpy's (unless the particular JVM uses ropes).Byzantine
side notes: the eclipse generated hashcode/equals suck (esp. visually), you probably want the class serializable; hashcode caching may not be needed (due to how hashmap works). Also declaring T1 and T2 String will help the compiler to statically link equals and hashcode (now it will probably fall back to inline caches though, not so shabby but still)Holocene
Y
1

I'd recommend to try Thorbjørn Ravn Andersens suggestion.

If you need the concatenated Strings, dependent on the length of the two parts, it might perform slightly better to create the StringBuilder instance with the required size to avoid reallocation. The default StringBuilder constructor reserves 16 Characters in the current implementation - at least on my machine. So, if the concatenated String is longer than the initial buffer size, the StringBuilder has to reallocate.

Try this out and tell us what your profiler has to say about it:

StringBuilder ccyPair = new StringBuilder(ccy1.length()+ccy2.length());
ccyPair.append(ccy1); 
ccyPair.append(ccy2); 
Yablon answered 22/2, 2011 at 10:29 Comment(0)
F
1

According to the Java specification (and since the very first version of Java), in the section "String Concatenation Operator +" it is said that :

To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression

So basically, using the + operator or StringBuilder.append for variables is basically the same.


Other thing, I know that in your question you mentioned adding only 2 Strings, but have in mind that adding 3 or more Strings will lead to different results :

I used a slightly modified @Duncan McGregor example. I have 5 methods concatenating 2 to 6 strings using concat and 5 methods concatenating 2 to 6 strings using StringBuilder :

// Initialization
    private final String s1 = new String("1234567890");
    private final String s2 = new String("1234567890");
    private final String s3 = new String("1234567890");
    private final String s4 = new String("1234567890");
    private final String s5 = new String("1234567890");
    private final String s6 = new String("1234567890");

// testing the concat
    public void testConcatenation2stringsConcat(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = s1.concat(s2);
        }
    }
    public void testConcatenation3stringsConcat(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = s1.concat(s2).concat(s3);
        }
    }
    public void testConcatenation4stringsConcat(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = s1.concat(s2).concat(s3).concat(s4);
        }
    }
    public void testConcatenation5stringsConcat(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = s1.concat(s2).concat(s3).concat(s4).concat(s5);
        }
    }
    public void testConcatenation6stringsConcat(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = s1.concat(s2).concat(s3).concat(s4).concat(s5).concat(s6);
        }
    }

//testing the StringBuilder
    public void testConcatenation2stringsSB(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = new StringBuilder(s1).append(s2).toString();
        }
    }
    public void testConcatenation3stringsSB(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = new StringBuilder(s1).append(s2).append(s3).toString();
        }
    }
    public void testConcatenation4stringsSB(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = new StringBuilder(s1).append(s2).append(s3).append(s4).toString();
        }
    }
    public void testConcatenation5stringsSB(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = new StringBuilder(s1).append(s2).append(s3).append(s4).append(s5).toString();
        }
    }
    public void testConcatenation6stringsSB(int count) {
        for (int i = 0; i < count; i++) {
            String s100 = new StringBuilder(s1).append(s2).append(s3).append(s4).append(s5).append(s6).toString();
        }
    }

I obtained these results (in seconds) :

testConcatenation2stringsConcat : 0.018 |||||||||||||||| testConcatenation2stringsSB : 0.2 testConcatenation3stringsConcat : 0.35 ||||||||||||||||||| testConcatenation3stringsSB : 0.25 testConcatenation4stringsConcat : 0.5 |||||||||||||||||||||| testConcatenation4stringsSB : 0.3 testConcatenation5stringsConcat : 0.67 ||||||||||||||||||| testConcatenation5stringsSB : 0.38 testConcatenation5stringsConcat : 0.9 |||||||||||||||||||||| testConcatenation5stringsSB : 0.43

  • You can see that concat is faster than StringBuilder only when concatenating only 2 Strings
  • See that when adding more and more Strings, the StringBuilder resulting time is increasing more slowly that using the concat
  • Note that the difference will be more significant when the strings are very long
Footsie answered 22/1, 2019 at 19:17 Comment(0)
D
1

Interestingly, StringJoiner isn't mentioned here…

Usually a separator has to be inserted between the strings, eg. ", ".
The code is easier to read using StringJoiner than using StringBuilder and similarly fast.

StringJoiner joiner = new StringJoiner( ", " );
joiner.add( ccy1 ).add( ccy2 );
Digester answered 18/4, 2020 at 11:0 Comment(2)
How does this perform, I have not used it before (in comparison to string builder or simple concatenation?Kaine
@ggb667 StringBuilder is definitely faster when concatenating two strings seamlessly than StringJoiner(""). As soon as you want to have a separator between the strings, the speed should not differ noticeably anymore. A benchmark would certainly be interesting here. In the latter case, I would prefer StringJoiner for better readability.Digester
E
0

@Duncan McGregor's answer gives some benchmark numbers for one particular example (sizes of input string) and one JVM version. In this case, it looks like String.concat() is the winner by a significant factor. This result may or may not generalize.

Aside: This surprises me! I'd have thought that the compiler writers would have chosen to use String.concat in cases where it is likely to be faster. The explanation is in the evaluation of this bug report ... and is rooted in the definition of the String concatenation operator.

(If a String-typed operand of + is null, the JLS states that the String "null" is used in its place. That wouldn't work if they code generated s + s2 as s.concat(s2) and s or s2 happened to be null; you'd get NPEs. And the case of s == null means that an alternative version of concat doesn't solve the NPE problem.)


However, @unwind's answer has given me an idea for an alternative solution that avoids the need for String concatenation.

If the concatenations of ccy1 and ccy2 are just done to join two keys, then maybe you could get better performance by defining a special hash table class that takes two keys instead of one. It would have operations like:

    public Object get(String key1, String key2) ...

    public void put(String key1, String key2, Object value) ...

The effect would be like a Map<Pair<String, String>, Object> (see @KitsuneYMG's answer) except that you don't need to create Pair<String, String> objects each time you want to do a get or put. The downside is:

  • you have to implement a new hash table class from the ground up, and
  • the new class won't conform to the Map interface.

Normally, I would not recommend doing this. However if the string concatenation and map lookup are really a critical bottleneck, a custom multi-key hash table might give you a significant speedup.

Embraceor answered 22/2, 2011 at 10:14 Comment(8)
Do you have any evidence for "you cannot improve String concatenation per se"?Fontes
@Stephen, look at String.concat() impl. there is NO surprise and it has been the best method for concatenating 2 strings ONLY. It allocates exactly as needed char[] and copies via System.arrayCopy (so one char[] alloc, 2 memcpy, one string alloc, cant beat that ever), but most of all, it's the only way to create a String w/o extra copy of the char array (as of now, back in the day StringBuffer didn't copy either)Holocene
The surprise is that they can't use s.concat(s2) for s + s2. But it makes sense; see above.Embraceor
@Stephen, yes, it doesn't work if any of the strings is null. But consider this: String.valueOf(s1).contact(String.valueOf(s2)); actually I'd swear I have seen JBuilder does it (but it was at least 8 years ago, so I'd not swear for real)Holocene
@Stephen, the custom map (2 value map) is the best solution to the problem. I guess I can post one.Holocene
@Stephen C - you can't have it both ways - saying that I have too much time on my hands and then complain that I only give results for one VM and set of input strings!Fontes
@Duncan McGregor - I'm not complaining. I'm just noting that the results may not be valid across the board ... which is something that you neglected to mention.Embraceor
@Stephen C - indeed, but I did give the conditions that I did use. I'd wager good odds that you'd find the similar odds for other VMs and strings - probably better for concat with longer strings - but I don't have reliable evidence for that hunch ;-)Fontes
P
0

Perhaps you can go around the problem by computing the hashes of the two strings individually, and then combining them, perhaps with a separate hash function that works on integers?

Something like:

int h1 = ccy1.hashCode(), h2 = ccy2.hashCode(), h = h1 ^ h2;

That could well be faster, since concatenating strings only to compute the hash of the concatenation seems wasteful.

Note that the above combines the two hashes with binary-XOR (the ^ operator) which often works but you might want to investigate that further.

Pauperism answered 22/2, 2011 at 10:14 Comment(1)
That doesn't help for a regular hashmap.Embraceor
S
0

Ok, so what is your question? Nothing to do: if you have to concatenate strings just do it. It is fine that you profiled your code. Now you can see the fact that string concatenation operator + automatically uses StringBuilder's append() method, so using

StringBuilder ccyPair = new StringBuilder(ccy1)
ccyPair.append(ccy2);

does not give you serious advantages.

The only serious way to optimize your code is probably change your design to omit the concatenation at all. But do it only if you really need it, i.e. the concatenation takes significant part of CPU time.

Signe answered 22/2, 2011 at 10:17 Comment(0)
H
0

Here it is a full implementation of linear-probe map w/ double keys, single value. It should outperform java.util.HashMap nicely as well.

Warning, it's written in the very early hours of the day from scratch, so it might contain bugs. Please feel free to edit it.

The solution must beat any wrapper, concat one at any time. The no allocation on get/put also makes it quick general purpose map.

Hope this solves the issue. (The code comes w/ some simple tests that are unneeded)


package bestsss.util;


@SuppressWarnings("unchecked")
public class DoubleKeyMap<K1, K2, V> {
    private static final int MAX_CAPACITY =  1<<29; 

    private static final Object TOMBSTONE = new String("TOMBSTONE");

    Object[] kvs; 
    int[] hashes;
    int count = 0;

    final int rehashOnProbes;   

    public DoubleKeyMap(){
        this(8, 5);
    }

    public DoubleKeyMap(int capacity, int rehashOnProbes){
        capacity = nextCapacity(Math.max(2, capacity-1));
        if (rehashOnProbes>capacity){
            throw new IllegalArgumentException("rehashOnProbes too high");
        }
        hashes = new int[capacity];
        kvs = new Object[kvsIndex(capacity)];       
        count = 0;
        this.rehashOnProbes = rehashOnProbes;
    }

    private static int nextCapacity(int c) {
        int n = Integer.highestOneBit(c)<<1;
        if (n<0 || n>MAX_CAPACITY){
            throw new Error("map too large");
        }
        return n;
    }

    //alternatively this method can become non-static, protected and overriden, the perfoamnce can drop a little
    //but if better spread of the lowest bit is possible, all good and proper
    private static<K1, K2> int hash(K1 key1, K2 key2){
        //spread more, if need be
        int h1 = key1.hashCode();
        int h2 = key2.hashCode();
        return h1+ (h2<<4) + h2; //h1+h2*17
    }

    private static int kvsIndex(int baseIdx){
        int idx = baseIdx;  
        idx+=idx<<1;//idx*3
        return idx;
    }

    private int baseIdx(int hash){
        return hash & (hashes.length-1);
    }

    public V get(K1 key1, K2 key2){
        final int hash = hash(key1, key2);


        final int[] hashes = this.hashes;
        final Object[] kvs = this.kvs;
        final int mask = hashes.length-1;

        for(int base = baseIdx(hash);;base=(base+1)&mask){
            int k = kvsIndex(base);
            K1 k1 = (K1) kvs[k];
            if (k1==null)
                return null;//null met; no such value

            Object value;
            if (hashes[base]!=hash || TOMBSTONE==(value=kvs[k+2]))
                continue;//next

            K2 k2 = (K2) kvs[k+1];
            if ( (key1==k1 || key1.equals(k1)) && (key2==k2 || key2.equals(k2)) ){
                return (V) value;
            }
        }
    }
    public boolean contains(K1 key1, K2 key2){
        return get(key1, key2)!=null;
    }

    public boolean containsValue(final V value){
        final Object[] kvs = this.kvs;
        if (value==null)
            return false;

        for(int i=0;i<kvs.length;i+=3){
            Object v = kvs[2];
            if (v==null || v==TOMBSTONE)
                continue;
            if (value==v || value.equals(v))
                return true;
        }
        return false;
    }

    public V put(K1 key1, K2 key2, V value){
        int hash = hash(key1, key2);
        return doPut(key1, key2, value, hash);
    }
    public V remove(K1 key1, K2 key2){
        int hash = hash(key1, key2);
        return doPut(key1, key2, null, hash);   
    }

    //note, instead of remove a TOMBSTONE is used to mark the deletion
    //this may leak keys but deletion doesn't need to shift the array like in Knuth 6.4
    protected V doPut(final K1 key1, final K2 key2, Object value, final int hash){
        //null value -> remove
        int probes = 0;
        final int[] hashes = this.hashes;
        final Object[] kvs = this.kvs;
        final int mask = hashes.length-1;

        //conservative resize: when too many probes and the count is greater than the half of the capacity
        for(int base = baseIdx(hash);probes<rehashOnProbes || count<(mask>>1);base=(base+1)&mask, probes++){
            final int k = kvsIndex(base);
            K1 k1 = (K1) kvs[k];
            K2 k2;
            //find a gap, or resize
            Object  old  = kvs[k+2];
            final boolean emptySlot = k1==null || (value!=null && old==TOMBSTONE); 
            if (emptySlot || (
                    hashes[base] == hash &&
                    (k1==key1  || k1.equals(key1)) &&
                    ((k2=(K2) kvs[k+1])==key2 || k2.equals(key2))) 
            ){

                if (value==null){//remove()
                    if (emptySlot)
                        return null;//not found, and no value ->nothing to do

                    value = TOMBSTONE;
                    count-=2;//offset the ++later
                }

                if (emptySlot){//new entry, update keys
                    hashes[base] = hash;                
                    kvs[k] = key1;
                    kvs[k+1] = key2;
                }//else -> keys and hash are equal


                if (old==TOMBSTONE) 
                    old=null;

                kvs[k+2] = value;
                count++;

                return (V) old;
            }
        }
        resize();
        return doPut(key1, key2, value, hash);//hack w/ recursion, after the resize
    }

    //optimized version during resize, doesn't check equals which is the slowest part   
    protected void doPutForResize(K1 key1, K2 key2, V value, final int hash){
        final int[] hashes = this.hashes;
        final Object[] kvs = this.kvs;
        final int mask = hashes.length-1;

        //find the 1st gap and insert there
        for(int base = baseIdx(hash);;base=(base+1)&mask){//it's ensured, no equal keys exist, so skip equals part
            final int k = kvsIndex(base);
            K1 k1 = (K1) kvs[k];
            if (k1!=null) 
                continue;

            hashes[base] = hash;                
            kvs[k] = key1;
            kvs[k+1] = key2;
            kvs[k+2] = value;
            return;
        }
    }

    //resizes the map by doubling the capacity, 
    //the method uses altervative varian of put that doesn't check equality, or probes; just inserts at a gap
    protected void resize(){        
        final int[] hashes = this.hashes;
        final Object[] kvs = this.kvs;
        final int capacity = nextCapacity(hashes.length);

        this.hashes = new int[capacity];
        this.kvs = new Object[kvsIndex(capacity)];

        for (int i=0;i<hashes.length; i++){
            int k = kvsIndex(i);
            K1 key1 = (K1) kvs[k];
            Object value = kvs[k+2];
            if (key1!=null && TOMBSTONE!=value){
                K2 key2 = (K2) kvs[k+1];
                doPutForResize(key1, key2, (V) value, hashes[i]);
            }
        }   
    }

    public static void main(String[] args) {
        DoubleKeyMap<String, String, Integer> map = new DoubleKeyMap<String, String, Integer>(4,2);
        map.put("eur/usd", "usd/jpy", 1);
        map.put("eur/usd", "usd/jpy", 2);

        map.put("eur/jpy", "usd/jpy", 3);

        System.out.println(map.get("eur/jpy", "usd/jpy"));
        System.out.println(map.get("eur/usd", "usd/jpy"));
        System.out.println("======");

        map.remove("eur/usd", "usd/jpy");
        System.out.println(map.get("eur/jpy", "usd/jpy"));
        System.out.println(map.get("eur/usd", "usd/jpy"));
        System.out.println("======");
        testResize();
    }
    static void testResize(){
        DoubleKeyMap<String, Integer, Integer> map = new DoubleKeyMap<String, Integer, Integer>(18, 17);
        long s = 0;
        String pref="xxx";

        for (int i=0;i<14000;i++){
            map.put(pref+i, i, i);

            if ((i&1)==1)
                map.remove(pref+i, i);
            else
                s+=i;
        }
        System.out.println("sum: "+s);
        long sum = 0;

        for (int i=0;i<14000;i++){
            Integer n = map.get(pref+i, i);
            if (n!=null && n!=i){
                throw new AssertionError(); 
            }
            if (n!=null){
                System.out.println(n);
                sum+=n;
            }
        }
        System.out.println("1st sum: "+s);
        System.out.println("2nd sum: "+sum);


    }
}
Holocene answered 4/3, 2011 at 2:53 Comment(0)
J
0

Bear in mind that if you are concatenating millions of strings, then string.concat will most likely generate millions of new string object refs. This will have an increased CPU utilisation.

Jevons answered 7/1, 2019 at 14:24 Comment(0)
N
0

I decided to try to benchmark it and here is what my results are. I guess using default "+" concatenation is the easiest and fastest (or almost one of the fastest) way to go.

JMH version: 1.19
VM version: JDK 1.8.0_211, VM 25.211-b12
VM options: -Xms2G -Xmx2G
Warmup: 10 iterations, 1 s each
Measurement: 30 iterations, 1 s each
Timeout: 10 min per iteration
Threads: 1 thread, will synchronize iterations
Benchmark mode: Average time, time/op
Parameters: (N = 1000000)

Benchmark          (N)  Mode  Cnt  Score     Error  Units
concat         1000000  avgt   30  24.839  ± 0.211  ms/op
plus           1000000  avgt   30  15.072  ± 0.155  ms/op
stringBuffer   1000000  avgt   30  14.835  ± 0.118  ms/op
stringBuilder  1000000  avgt   30  14.775  ± 0.205  ms/op

Here is the bench code:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 2, jvmArgs = {"-Xms2G", "-Xmx2G"})
@Warmup(iterations = 10)
@Measurement(iterations = 30)
public class BenchmarkString {

    @Param({"1000000"})
    private int N;
    private final String s1 = new String("1234567890124567890");
    private final String s2 = new String("1234567890124567890");

    public static void main(String[] args) throws RunnerException {

        Options opt = new OptionsBuilder()
                .include(BenchmarkString.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }

    @Benchmark
    public void plus() {
        for (int i = 0; i < N; i++) {
            String s = s1 + s2;
        }
    }

    @Benchmark
    public void stringBuilder() {
        for (int i = 0; i < N; i++) {
            String s = new StringBuilder(s1).append(s2).toString();
        }
    }

    @Benchmark
    public void stringBuffer() {
        for (int i = 0; i < N; i++) {
            String s = new StringBuffer(s1).append(s2).toString();
        }
    }

    @Benchmark
    public void concat() {
        for (int i = 0; i < N; i++) {
            String s = s1.concat(s2);
        }
    }
}
Nunn answered 18/4, 2020 at 14:46 Comment(0)
C
-1
StringBuffer ccyPair =  new StringBuffer();      
ccyPair.append("ccy1").append("ccy2"); 

Have you tried using a String Buffer and then use a profiler to examine where is the bottleneck.Give it a try and see what happens.

Curtsey answered 22/2, 2011 at 10:13 Comment(3)
StringBuffer will definitely not perform better here since StringBuilder is its not threadsafe counterpart, avoiding the unnecessary synchronisation overhead.Yablon
Indeed - StringBuilder is significantly fasterFontes
actually - you end up w/ "ccy1ccy2" always.Holocene

© 2022 - 2024 — McMap. All rights reserved.