Usually, such libraries do not deallocate memory due to garbage collection. Particularly: JCuda does not do this, and has no option or "mode" where this can be done.
The reason is quite simple: It does not work.
You'll often have a pattern like this:
void doSomethingWithJCuda()
{
CUdeviceptr data = new CUdeviceptr();
cuMemAlloc(data, 1000);
workWith(data);
// *(See notes below)
}
Here, native memory is allocated, and the Java object serves as a "handle" to this native memory.
At the last line, the data
object goes out of scope. Thus, it becomes eligible for garbage collection. However, there are two issues:
1. The garbage collector will only destroy the Java object, and not free the memory that was allocated with cuMemAlloc
or any other native call.
So you'll usually have to free the native memory, by explicitly calling
cuMemFree(data);
before leaving the method.
2. You don't know when the Java object will be garbage collected - or whether it will be garbage collected at all.
A common misconception is that an object becomes garbage collected when it is no longer reachable, but this is not necessarily true.
As bmargulies pointed out in his answer:
One means is to have a Java object with a finalizer that makes the necessary JNI call to free native memory.
It may look like a viable option to simply override the finalize()
method of these "handle" objects, and do the cuMemFree(this)
call there. This has been tried, for example, by the authors of JavaCL (a library that also allows using the GPU with Java, and thus, is conceptually somewhat similar to JCuda).
But it simply does not work: Even if a Java object is no longer reachable, this does not mean that it will be garbage collected immediately.
You simply don't know when the finalize()
method will be called.
This can easily cause nasty errors: When you have 100 MB of GPU memory, you can use 10 CUdeviceptr
objects, each allocating 10MB. Your GPU memory is full. But for Java, these few CUdeviceptr
objects only occupy a few bytes, and the finalize()
method may not be called at all during the runtime of the application, because the JVM simply does not need to reclaim these few bytes of memory. (Omitting discussions about hacky workarounds here, like calling System.gc()
or so - the bottom line is: It does not work).
So answering your actual question: JCuda is a very low-level library. This means that you have the full power, but also the full responsibilities of manual memory management. I know that this is "inconvenient". When I started creating JCuda, I originally intended it as a low-level backend for an object-oriented wrapper library. But creating a robust, stable and universally applicable abstraction layer for a complex general-purpose library like CUDA is challenging, and I did not dare to tackle such a project - last but not least because of the complexities that are implied by ... things like garbage collection...