Performance difference between MRI Ruby and jRuby
Asked Answered
S

2

5

While doing some benchmarking to answer this question about the fastest way to concatenate arrays I was surprised that when I did the same benchmarks in with jRuby the tests were a lot slower.

Does this mean that the old adagio about jRuby being faster than MRI Ruby is gone ? Or is this about how arrays are treated in jRuby ?

Here the benchmark and the results in both MRI Ruby 2.3.0 and jRuby 9.1.2.0 Both run on a 64bit Windows 7 box, all 4 processors busy for 50-60%, memory in use ± 5.5GB. The jRuby had to be started with the parameter -J-Xmx1500M to provide enough heap space. I had to remove the test with push because of stack level too deep and also removed the slowest methods to not make the tests too long. Used Jave runtime: 1.7.0_21

require 'Benchmark'
N = 100

class Array
  def concat_all 
    self.reduce([], :+)
  end
end

# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

This question is not about the different methods used, see the original question for that. In both situations MRI is 7 times faster ! Can someone exlain me why ? I'm also curious to how other implementations do, like RBX (Rubinius)

C:\Users\...>d:\jruby\bin\jruby -J-Xmx1500M concat3.rb
       user     system      total        real
plus         0.000000   0.000000   0.000000 (  0.000946)
concat       0.000000   0.000000   0.000000 (  0.001436)
splash       0.000000   0.000000   0.000000 (  0.001456)
concat_all   0.000000   0.000000   0.000000 (  0.002177)
flat_map  0.010000   0.000000   0.010000 (  0.003179)
       user     system      total        real
plus       140.166000   0.000000 140.166000 (140.158687)
concat     143.475000   0.000000 143.475000 (143.473786)
splash     139.408000   0.000000 139.408000 (139.406671)
concat_all 144.475000   0.000000 144.475000 (144.474436)
flat_map143.519000   0.000000 143.519000 (143.517636)

C:\Users\...>ruby concat3.rb
       user     system      total        real
plus         0.000000   0.000000   0.000000 (  0.000074)
concat       0.000000   0.000000   0.000000 (  0.000065)
splash       0.000000   0.000000   0.000000 (  0.000098)
concat_all   0.000000   0.000000   0.000000 (  0.000141)
flat_map     0.000000   0.000000   0.000000 (  0.000122)
       user     system      total        real
plus        15.226000   6.723000  21.949000 ( 21.958854)
concat      11.700000   9.142000  20.842000 ( 20.928087)
splash      21.247000  12.589000  33.836000 ( 33.933170)
concat_all  14.508000   8.315000  22.823000 ( 22.871641)
flat_map    11.170000   8.923000  20.093000 ( 20.170945)
Stewart answered 10/11, 2016 at 13:48 Comment(9)
On my system (OS X, JRuby 9.1.6.0, MRI 2.3.1), the "small" arrays are faster on MRI whereas the "large" arrays are 2-4x faster in JRuby. This is due to CPU utilization: MRI only uses one core and JRuby makes my fan spin up. Not sure why the results are that different.Decorative
I found the culprit: -J-Xmx1500M. I didn't use that option at first, so Java's default max heap size was used (4096M on my system) and it worked just fine. If I provide that option, thus lowering the value to 1500M, I get painfully slow results.Decorative
As with almost all "Why is Java slow" benchmarking questions, I suspect that you are not properly benchmarking. For example, JRuby only compiles Ruby code to JVM bytecode after a method has been executed 20 times (I think). And HotSpot only compiles JVM bytecode to native machine code after a method has been executed several thousand times (IIRC, the default threshold for the C1 compiler is 20000). None of your methods are executed even remotely often enough to end up being compiled, they will always be interpreted. For example, concat_all is only executed N times, so N should be …Cringe
… at least 20120. (And you need to throw the first 20020 measurements away.) Ideally more, since there is also a stabilization period after compilation.Cringe
@JörgWMittag and Stefan Please put that in an answer so that we van better discuss in commentsStewart
@Stewart could you please add rbx to your question? There are no recent benchmark answers that compare all three rubies.Innoxious
@Innoxious I just did, are you gonna give an answer ? I'm very curious because I have no experience at all with RubiniusStewart
@Stewart thank you. I was rather interested in how it performs in kares's environment. But if he isn't interested, I will set up an environment.Innoxious
@Innoxious did a test myself, an occasion to test my docker skills, see my own answer, hope the results are better on genuine LinuxStewart
J
4

general rule is (as mentioned in the comments) that JRuby/JVM needs warmup.

usually bmbm is good fit, although TIMES=1000 should be increased (at least for the small array cases), also 1.5G might be not enough for optimal performance of JRuby (noticed a considerable change in numbers going from -Xmx2g to -Xmx3g). here's the results :

ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

$ ruby concat3.rb
Rehearsal -----------------------------------------------
plus          0.000000   0.000000   0.000000 (  0.000076)
concat        0.000000   0.000000   0.000000 (  0.000070)
splash        0.000000   0.000000   0.000000 (  0.000099)
concat_all    0.000000   0.000000   0.000000 (  0.000136)
flat_map      0.000000   0.000000   0.000000 (  0.000138)
-------------------------------------- total: 0.000000sec

                  user     system      total        real
plus          0.000000   0.000000   0.000000 (  0.000051)
concat        0.000000   0.000000   0.000000 (  0.000059)
splash        0.000000   0.000000   0.000000 (  0.000083)
concat_all    0.000000   0.000000   0.000000 (  0.000120)
flat_map      0.000000   0.000000   0.000000 (  0.000173)
Rehearsal -----------------------------------------------
plus         43.040000   3.320000  46.360000 ( 46.351004)
concat       15.080000   3.870000  18.950000 ( 19.228059)
splash       49.680000   4.820000  54.500000 ( 54.587707)
concat_all   51.840000   5.260000  57.100000 ( 57.114867)
flat_map     17.380000   5.340000  22.720000 ( 22.716987)
------------------------------------ total: 199.630000sec

                  user     system      total        real
plus         42.880000   3.600000  46.480000 ( 46.506013)
concat       17.230000   5.290000  22.520000 ( 22.890809)
splash       60.300000   7.480000  67.780000 ( 67.878534)
concat_all   54.910000   6.480000  61.390000 ( 61.404383)
flat_map     17.310000   5.570000  22.880000 ( 23.223789)

...

jruby 9.1.6.0 (2.3.1) 2016-11-09 0150a76 Java HotSpot(TM) 64-Bit Server VM 25.112-b15 on 1.8.0_112-b15 +jit [linux-x86_64]

$ jruby -J-Xmx3g concat3.rb
Rehearsal -----------------------------------------------
plus          0.010000   0.000000   0.010000 (  0.001445)
concat        0.000000   0.000000   0.000000 (  0.002534)
splash        0.000000   0.000000   0.000000 (  0.001791)
concat_all    0.000000   0.000000   0.000000 (  0.002513)
flat_map      0.010000   0.000000   0.010000 (  0.007088)
-------------------------------------- total: 0.020000sec

                  user     system      total        real
plus          0.010000   0.000000   0.010000 (  0.002700)
concat        0.000000   0.000000   0.000000 (  0.001085)
splash        0.000000   0.000000   0.000000 (  0.001569)
concat_all    0.000000   0.000000   0.000000 (  0.003052)
flat_map      0.000000   0.000000   0.000000 (  0.002252)
Rehearsal -----------------------------------------------
plus         32.410000   0.670000  33.080000 ( 17.385688)
concat       18.610000   0.060000  18.670000 ( 11.206419)
splash       57.770000   0.330000  58.100000 ( 25.366032)
concat_all   19.100000   0.030000  19.130000 ( 13.747319)
flat_map     16.160000   0.040000  16.200000 ( 10.534130)
------------------------------------ total: 145.180000sec

                  user     system      total        real
plus         16.060000   0.040000  16.100000 ( 11.737483)
concat       15.950000   0.030000  15.980000 ( 10.480468)
splash       47.870000   0.130000  48.000000 ( 22.668069)
concat_all   19.150000   0.030000  19.180000 ( 13.934314)
flat_map     16.850000   0.020000  16.870000 ( 10.862716)

... so it seems like the opposite - MRI 2.3 gets 2-5x slower than JRuby 9.1

cat concat3.rb
require 'benchmark'
N = (ENV['TIMES'] || 100).to_i

class Array
  def concat_all
    self.reduce([], :+)
  end
end

# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a

Benchmark.bmbm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a

Benchmark.bmbm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end
Jerrold answered 11/11, 2016 at 9:54 Comment(1)
could you please add rubinius to your benchmarks? :-)Innoxious
S
1

What I have learned from these comments and answers and the tests I did myself afterward..

  • the OS probably makes a difference, I would have liked more answers in different situations so here I'm just guessing
  • the fastest method differs between runtime, MRI or jRuby, 32 of 64bit, JRE, so making claims that that method is beter than that other one is difficult, on my sysrtem the plus method was fastest in almost all circumstances but I didin't use Java HotSpot like kares
  • in 64 bit jRuby you can specify a much higher heap than in 32 bit (1.5G on my system), in 64 bit I coult use more heap than I have memory (a bug somewhere ?)
  • higher heaps speed up operations using much memory like the huge arrays I used
  • use the latest Java runtime, speed is better
  • jRuby needs a warmup, a methods needs to run a number of times before compiled, so use .bm and .bmbm with different repeat values to find that margin
  • Sometimes MRI is faster but with the right parameters and warmup jRuby was 3 to 3.5 times as fast on my system for this particular test

The last, together with the loading of the JVM makes MRI better for short ad hoc scripts, jRuby better for process hungry, longer running processes with methods repeated often, so jRuby would be better for running servers and services.

What I saw confirmed: do your own benchmarks for long or repeated processes. Both implementations have made big improvements in speed compared to earlier versions, let's not forget: Ruby may be a slower runner but a faster developer and if you compare the cost of some extra hardware to some extra developers...

Thanks to all the commenters and karen for their expertise.

EDIT

Out of curiosity I run the test also with Rubinius in a docker container (I'm on Windows), rubinius 3.69 (2.3.1 a57071c6 2016-11-17 3.8.0) [x86_64-linux-gnu] Only concat and flat_map are on par with MRI, I wonder if these methods are in C and the rest in pure Ruby..

Rehearsal -----------------------------------------------
plus          0.000000   0.000000   0.000000 (  0.000742)
concat        0.000000   0.000000   0.000000 (  0.000093)
splash        0.000000   0.000000   0.000000 (  0.000619)
concat_all    0.000000   0.000000   0.000000 (  0.001357)
flat_map      0.000000   0.000000   0.000000 (  0.001536)
-------------------------------------- total: 0.000000sec

                  user     system      total        real
plus          0.000000   0.000000   0.000000 (  0.000589)
concat        0.000000   0.000000   0.000000 (  0.000084)
splash        0.000000   0.000000   0.000000 (  0.000596)
concat_all    0.000000   0.000000   0.000000 (  0.001679)
flat_map      0.000000   0.000000   0.000000 (  0.001568)
Rehearsal -----------------------------------------------
plus         68.770000  63.320000 132.090000 (265.589506)
concat       20.300000   2.810000  23.110000 ( 23.662007)
splash       79.310000  74.090000 153.400000 (305.013934)
concat_all   83.130000 100.580000 183.710000 (378.988638)
flat_map     20.680000   0.960000  21.640000 ( 21.769550)
------------------------------------ total: 513.950000sec

                  user     system      total        real
plus         65.310000  70.300000 135.610000 (273.799215)
concat       20.050000   0.610000  20.660000 ( 21.163930)
splash       79.360000  80.000000 159.360000 (316.366122)
concat_all   84.980000  99.880000 184.860000 (383.870653)
flat_map     20.940000   1.760000  22.700000 ( 22.760643)
Stewart answered 11/11, 2016 at 21:39 Comment(1)
yeah, its important to mention what JVM you're using. people assume HotSpot when you just say Java. its always great to know the version as well, sometimes (rare but still) there might be improvements in a particular Java version. obviously use the latest JRuby version (9.1.2 had bugs - use newer 9.1.x if possible). allowing for more heap than you have memory is not a bug - sure kind of sub-optimal (JVM could print a warning at least) ... since you might still have swap :)Jerrold

© 2022 - 2024 — McMap. All rights reserved.