Why doesn't this Ruby program return off heap memory to the operating system?
Asked Answered
R

1

21

I am trying to understand when memory allocated off the Ruby heap gets returned to the operating system. I understand that Ruby never returns memory allocated to it's heap but I am still not sure about the behaviour of off heap memory. i.e. those objects that don't fit into a 40 byte RVALUE.

Consider the following program that allocates some large strings and then forces a major GC.

require 'objspace'

STRING_SIZE = 250

def print_stats(msg)
  puts '-------------------'
  puts msg
  puts '-------------------'
  puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1,"KB";}'`}"
  puts "HEAP SIZE: #{(GC.stat[:heap_sorted_length] * 408 * 40)/1024} KB"
  puts "SIZE OF ALL OBJECTS: #{ObjectSpace.memsize_of_all/1024} KB"
end

def run
  print_stats('START WORK')
  @data=[]
  600_000.times do
    @data <<  " "  * STRING_SIZE
  end
  print_stats('END WORK')
  @data=nil
end

run
GC.start
print_stats('AFTER FORCED MAJOR GC')

Running this program with Ruby 2.2.3 on MRI it produces the following output. After a forced major GC, the heap size is as expected but RSS has not decreased significantly.

-------------------
START WORK
-------------------
RSS: 7036 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3172 KB
-------------------
END WORK
-------------------
RSS: 205660 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 178423 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 164492 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 2484 KB

Compare these results to the following results when we allocate one large object instead of many smaller objects.

def run
  print_stats('START WORK')
  @data = " " * STRING_SIZE * 600_000
  print_stats('END WORK')
  @data=nil
end

-------------------
START WORK
-------------------
RSS: 7072 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3170 KB
-------------------
END WORK
-------------------
RSS: 153584 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 149064 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 7096 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 2483 KB

Note the final RSS value. We seem to have freed all the memory we allocated for the big string.

I am not sure why the second example releases the memory but the first example doesn't as they are both allocating memory off the Ruby heap. This is one reference that could provide an explanation but I would be interested in explanations from others.

Releasing memory back to the kernel also has a cost. User space memory allocators may hold onto that memory (privately) in the hope it can be reused within the same process and not give it back to the kernel for use in other processes.

Randa answered 3/10, 2015 at 14:22 Comment(3)
Subscribing to this thread. I'm very interested in this as well.Cobble
The fundamental difference is in 1st example, where 600k new objects are created, in the second only one. Although the total size of referenced data is the same, the first example requires 600 thousand times more slots for referencing objects (which are probably never or much later reclaimed to OS).Abase
I'd suggest following article and linked explanation of RVALUEs. I'm not sure if they are quite correct, only Koichi aka ko1 himself may know. Or some strongly determined enthusiast, analysing Ruby sources.Abase
M
3

@joanbm has a very good point here. His referenced article explains this pretty well :

Ruby's GC releases memory gradually, so when you do GC on 1 big chunk of memory pointed by 1 reference it releases it all, but when there is a lot of references, the GC will releases memory in smaller chuncks.

Several calls to GC.start will release more and more memory in the 1st example.


Here are 2 orther articles to dig deeper :

Madden answered 23/2, 2016 at 15:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.