java.lang.OutOfMemoryError when plenty of memory left (94GB / 200GB Xmx)
Asked Answered
A

1

8

I am trying to create large RDF/HDT files, which in turn means reading large files into memory, etc. Now, that is not really an issue since the server has 516GB of memory, around 510GB of which are free.

I am using the rdfhdt library to create the files, which works just fine. However, for one specific file, I keep getting an OutOfMemoryError, with no real reason as to why. Here is the stack trace:

 Exception in thread "main" java.lang.OutOfMemoryError
    at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
    at org.rdfhdt.hdt.util.string.ByteStringUtil.append(ByteStringUtil.java:238)
    at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:123)
    at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:87)
    at org.rdfhdt.hdt.dictionary.impl.FourSectionDictionary.load(FourSectionDictionary.java:83)
    at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:441)
    at org.rdfhdt.hdt.hdt.writer.TripleWriterHDT.close(TripleWriterHDT.java:96)
    at dk.aau.cs.qweb.Main.makePredicateStores(Main.java:137)
    at dk.aau.cs.qweb.Main.main(Main.java:69)

I am running the Jar file with the tag -Xmx200G. The strange thing is, when looking in 'top', it shows VIRT to be 213G (as expected). However, every time RES climbs to just about 94GB, it crashes with the error above, which I think is strange since it should have more than 100GB left to use. I looked in this question, as the problem seems to be similar to mine, although on a different scale. However, using -verbose:gc and -XX:+PrintGCDetails doesn't seem to give me any indication as to what is wrong, and there is about 500G of swap space available as well.

The, perhaps, strangest thing however is the fact that the specific file I have issues with is not even the largest files. For scale, it has about 83M triples to write, and for other files, up to 200M triples have not been an issue. I am using Java version 1.8.0_66 and Ubuntu version 14.04.3 LTS.

So my question is, if anyone can explain what I am doing wrong? It seems very strange to me that larger files have no issue, but this one does. Please let me know if you need any other information.

Acceptation answered 24/11, 2018 at 20:36 Comment(4)
Curious (and related to prev. deleted comment): if increasing to, eg, -Xmx300G (up from 200), does the program complete? What is the maximum memory used by the process in this case?Fellini
@Fellini sadly no. I've tried with up to -Xmx500G and it still doesn't run (at which point I guess it is too high anyways). In total I tried giving it up to 200, 250, 300, 400 and 500GB of heap space.Acceptation
looks like the dictionary size hold in-memory exceeds the max. capacity of the corresponding Java structure. You should open an issue on Github. What would be interesting, is there any obvious difference between this file and the other files that are working? Maybe larger literals, the number of different nodes, etc. ?Banuelos
And did you try the C++ implementation of HDT?Banuelos
D
7

Due to Java's maximum array length, a ByteArrayOutputStream can not hold more than 2GB of data. This is true regardless of your current amount of RAM or memory limits. Here's the code you're hitting:

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();

You'll have to rewrite your code to not try to keep that much data in a single array.

Drawn answered 24/11, 2018 at 21:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.