I'm comparing 2 ways to filter lists, with and without using streams. It turns out that the method without using streams is faster for a list of 10,000 items. I'm interested in understanding why is it so. Can anyone explain the results please?
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
words.removeIf(word -> word.length() <= longWordMinLength);
return words.size();
}
public static int countLongWordsUsingStreams(final List<String> words, final int longWordMinLength) {
return (int) words.stream().filter(w -> w.length() > longWordMinLength).count();
}
Microbenchmark using JMH:
@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsWithoutUsingStreams() {
countLongWordsWithoutUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsUsingStreams() {
countLongWordsUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
public static void main(String[] args) throws RunnerException {
final Options opts = new OptionsBuilder()
.include(PracticeQuestionsCh8Benchmark.class.getSimpleName())
.warmupIterations(5).measurementIterations(5).forks(1).build();
new Runner(opts).run();
}
java -jar target/benchmarks.jar -wi 5 -i 5 -f 1
Benchmark
Mode Cnt Score Error Units
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsUsingStreams thrpt 5 10.219 ± 0.408 ops/ms
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsWithoutUsingStreams thrpt 5 910.785 ± 21.215 ops/ms
Edit: (as someone deleted the update posted as an answer)
public class PracticeQuestionsCh8Benchmark {
private static final int NUM_WORDS = 10000;
private static final int LONG_WORD_MIN_LEN = 10;
private final List<String> words = makeUpWords();
public List<String> makeUpWords() {
List<String> words = new ArrayList<>();
final Random random = new Random();
for (int i = 0; i < NUM_WORDS; i++) {
if (random.nextBoolean()) {
/*
* Do this to avoid string interning. c.f.
* http://en.wikipedia.org/wiki/String_interning
*/
words.add(String.format("%" + LONG_WORD_MIN_LEN + "s", i));
} else {
words.add(String.valueOf(i));
}
}
return words;
}
@Benchmark
@BenchmarkMode(AverageTime)
@OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsWithoutUsingStreams() {
return countLongWordsWithoutUsingStreams(words, LONG_WORD_MIN_LEN);
}
@Benchmark
@BenchmarkMode(AverageTime)
@OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsUsingStreams() {
return countLongWordsUsingStreams(words, LONG_WORD_MIN_LEN);
}
}
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
final Predicate<String> p = s -> s.length() >= longWordMinLength;
int count = 0;
for (String aWord : words) {
if (p.test(aWord)) {
++count;
}
}
return count;
}
public static int countLongWordsUsingStreams(final List<String> words,
final int longWordMinLength) {
return (int) words.stream()
.filter(w -> w.length() >= longWordMinLength).count();
}
return countLongWordxxx();
from your benchmark methods. – SchnitzlerCollections.nCopies(n, obj)
for its size will simply return n! – LabialremoveIf
not working on an immutable collection, and not returning a value from the benchmark. Even if these were fixed, my point aboutsize()
doing different work from counting makes the comparison invalid. Now, the edit avoidsnCopies
and fixes these problems, but there are still differences between the two benchmarks. – Labialint
values whereas the other counts usinglong
values. It seems unlikely to me that these would make a significant difference, but the point is, we don't know, and they affect the validity of the comparison. Only when you have two benchmarks whose only difference is what you're measuring can you proceed with the analysis that Aleksey Shipilev suggested. – Labial