How to get past 1gb memory limit of 64 bit LuaJIT on Linux?

Asked 19/11, 2014 at 10:58 Answered 19/1, 2017 at 17:33

The overview is I am prototyping code to understand my problem space, and I am running into 'PANIC: unprotected error in call to Lua API (not enough memory)' errors. I am looking for ways to get around this limit.

The environment bottom line is Torch, a scientific computing framework that runs on LuaJIT, and LuaJIT runs on Lua. I need Torch because I eventually want to hammer on my problem with neural nets on a GPU, but to get there I need a good representation of the problem to feed to the nets. I am (stuck) on Centos Linux, and I suspect that trying to rebuild all the pieces from source in 32bit mode (this is reported to extend the LuaJIT memory limit to 4gb) will be a nightmare if it works at all for all of the libraries.

The problem space itself is probably not particularly relevant, but in overview I have datafiles of points that I calculate distances between and then bin (i.e. make histograms of) these distances to try and work out the most useful ranges. Conveniently I can create complicated Lua tables with various sets of bins and torch.save() the mess of counts out, then pick it up later and inspect with different normalisations etc. -- so after one month of playing I am finding this to be really easy and powerful.

I can make it work looking at up to 3 distances with 15 bins each (15x15x15 plus overhead), but this only by adding explicit garbagecollection() calls and using fork()/wait() for each datafile so that the outer loop will keep running if one datafile (of several thousand) still blows the memory limit and crashes the child. This gets extra painful as each successful child process now has to read, modify and write the current set of bin counts -- and my largest files for this are currently 36mb. I would like to go larger (more bins), and would really prefer to just hold the counts in the 15 gigs of RAM I can't seem to access.

So, here are some paths I have thought of; please do comment if you can confirm/deny that any of them will/won't get me outside of the 1gb boundary, or will just improve my efficiency within it. Please do comment if you can suggest another approach that I have not thought of.

am I missing a way to fire off a Lua process that I can read an arbitrary table back in from? No doubt I can break my problem into smaller pieces, but parsing a return table from stdio (as from a system call to another Lua script) seems error prone, and writing/reading small intermediate files will be a lot of disk i/o.
am I missing a stash-and-access-table-in-high-memory module ? This seems like what I really want, but not found it yet
can FFI C data structures be put outside the 1gb? Doesn't seem like that would be the case but certainly I lack a full understanding of what is causing the limit in the first place. I suspect that this will just get me an efficiency improvement over generic Lua tables for the few pieces that have moved beyond prototyping? (unless I do a bunch of coding for each change)
Surely I can get out by writing an extension in C (Torch appears to support nets that should go outside of the limit), but my brief investigation there turns up references to 'lightuserdata' pointers -- does this mean that a more normal extension won't get outside 1gb either? This also seems like it has the heavy development cost for what should be a prototyping exercise.

I know C well so going the FFI or extension route doesn't bother me - but I know from experience that encapsulating algorithms in this way can be both really elegant and really painful with two places to hide bugs. Working through data structures containing tables within tables on the stack doesn't seem great either. Before I make this effort I would like to be certain that the end result really will solve my problem.

Thanks for reading the long post.

Samaniego answered 19/11, 2014 at 10:58 Comment(3)

Is this question still relevant for latest luajit version? – Fromm 23/2, 2019 at 0:6

Yes, as far as I can tell - it does not appear from the changelogs that this has changed much since 2014. Unfortunately with Torch switching to maintenance mode I am now in the process of porting my code from Lua :-( – Samaniego 24/2, 2019 at 7:52

x64/LJ_GC64: Add JIT compiler backend from here: github.com/LuaJIT/LuaJIT/blob/v2.1/doc/changes.html . Just tested, able to allocate 6 GB of memory. Don't know how stable it is. – Fromm 24/2, 2019 at 14:40

Only object allocated by LuaJIT itself are limited to the first 2GB of memory. This means that tables, strings, full userdata (i.e. not lightuserdata), and FFI objects allocated with ffi.new will count towards the limit, but objects allocated with malloc, mmap, etc. are not subjected to this limit (regardless if called by a C module or the FFI).

An example for allocating a structure with malloc:

ffi.cdef[[
    typedef struct { int bar; } foo;
    void* malloc(size_t);
    void free(void*);
]]

local foo_t = ffi.typeof("foo")
local foo_p = ffi.typeof("foo*")

function alloc_foo()
    local obj = ffi.C.malloc(ffi.sizeof(foo_t))
    return ffi.cast(foo_p, obj)
end

function free_foo(obj)
    ffi.C.free(obj)
end

The new GC to be implemented in LuaJIT 3.0 IIRC will not have this limit, but I haven't heard any news on it's development recently.

Source: http://lua-users.org/lists/lua-l/2012-04/msg00729.html

Juster answered 19/11, 2014 at 14:44 Comment(2)

What's if I have multiple lua_State in a single process? Does this limit apply for a single lua_State or global? – Cassation 22/11, 2014 at 3:40

I think it's global, but you may want to ask on the LuaJIT mailing list to make sure. – Juster 22/11, 2014 at 14:14

Here is some follow-up information for those who find this question later:

The key information is as posted by Colonel Thirty Two, that C module extensions and FFI code can easily get outside of the limit. (and the referenced lua list post reminds that plain Lua tables that go outside the limit will be very slow to garbage collect)

It took me some time to pull the pieces together to both access and save/load my objects, so here it is in one place:

I used lds at https://github.com/neomantra/lds as a starting point, in particular the 1-D Array code.

This broke using torch.save(), as it doesn't know how to write the new objects. For each object I added the code below (using Array as the example):

function Array:load(inp)
   for i=1,#inp do
      self._data[i-1] = tonumber(inp[i])
   end
   return self
end

function Array:serialize ()
   local siz = tonumber(self._size)
   io.write(' lds.ArrayT( ffi.typeof("double"), lds.MallocAllocator )( ', siz , "):load({")
   for i=0,siz-1 do
      io.write(string.format("%a,", self._data[i]))
   end
   io.write("})")
end

Note that my application specifically uses doubles and malloc(), so a better implementation would store and use these in self rather than hard coding above.

Then as discussed in PiL and elsewhere, I needed a serializer that would handle the object:

function serialize (o)
     if type(o) == "number" then
       io.write(o)
     elseif type(o) == "string" then
       io.write(string.format("%q", o))
     elseif type(o) == "table" then
       io.write("{\n")
       for k,v in pairs(o) do
          io.write("  ["); serialize(k); io.write("] = ")
         serialize(v)
         io.write(",\n")
       end
       io.write("}\n")
     elseif o.serialize then
        o:serialize()
     else
       error("cannot serialize a " .. type(o))
     end
end

and this needs to be wrapped with:

io.write('do local _ = ')
serialize( myWeirdTable )
io.write('; return _; end')

and then the output from that can be loaded back in with

local myWeirdTableReloaded = dofile('myWeirdTableSaveFile')

See PiL (Programming in Lua book) for dofile()

Hope that helps someone!

Samaniego answered 21/11, 2014 at 11:54 Comment(0)

You can use the torch tds module. From the README:

Data structures which do not rely on Lua memory allocator, nor being limited by Lua garbage collector.

Only C types can be stored: supported types are currently number, strings, the data structures themselves (see nesting: e.g. it is possible to have a Hash containing a Hash or a Vec), and torch tensors and storages. All data structures can store heterogeneous objects, and support torch serialization.

Leptosome answered 19/1, 2017 at 17:33 Comment(0)

Recommended topics

Hot tags