I am trying to create large RDF/HDT files, which in turn means reading large files into memory, etc. Now, that is not really an issue since the server has 516GB of memory, around 510GB of which are free.
I am using the rdfhdt library to create the files, which works just fine. However, for one specific file, I keep getting an OutOfMemoryError, with no real reason as to why. Here is the stack trace:
Exception in thread "main" java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.rdfhdt.hdt.util.string.ByteStringUtil.append(ByteStringUtil.java:238)
at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:123)
at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:87)
at org.rdfhdt.hdt.dictionary.impl.FourSectionDictionary.load(FourSectionDictionary.java:83)
at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:441)
at org.rdfhdt.hdt.hdt.writer.TripleWriterHDT.close(TripleWriterHDT.java:96)
at dk.aau.cs.qweb.Main.makePredicateStores(Main.java:137)
at dk.aau.cs.qweb.Main.main(Main.java:69)
I am running the Jar file with the tag -Xmx200G
. The strange thing is, when looking in 'top', it shows VIRT to be 213G (as expected). However, every time RES climbs to just about 94GB, it crashes with the error above, which I think is strange since it should have more than 100GB left to use. I looked in this question, as the problem seems to be similar to mine, although on a different scale. However, using -verbose:gc
and -XX:+PrintGCDetails
doesn't seem to give me any indication as to what is wrong, and there is about 500G of swap space available as well.
The, perhaps, strangest thing however is the fact that the specific file I have issues with is not even the largest files. For scale, it has about 83M triples to write, and for other files, up to 200M triples have not been an issue. I am using Java version 1.8.0_66 and Ubuntu version 14.04.3 LTS.
So my question is, if anyone can explain what I am doing wrong? It seems very strange to me that larger files have no issue, but this one does. Please let me know if you need any other information.