Do objects with a lot of reference fields (except arrays) devastate Hotspot JVM's GC(s) heap traversal performance?

Let me rephrase the question. "Is it better to have class A or class B, below?"

class A {
  Target[] array;
}

class B {
  Target a, b, c, ..., z;
}

The usual maintainability issues notwithstanding... From VM side of view, given the resolved reference to class B, it requires one dereference to reach Target field. While in class A, it requires two derferences, because we also need to read through the array.

The handling of object references in two cases is subtly different: in class A, VM knows there is an contiguous array of references, and so it does not need to know anything else. In class B, VM has to know which fields are references (because there could be non-reference fields, for example), which requires maintaining the oop maps in the class metadata:

//  InstanceKlass embedded field layout (after declared fields):
...
//    [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
//      The embedded nonstatic oop-map blocks are short pairs (offset, length)
//      indicating where oops are located in instances of this

Note that while footprint overhead is there, it is unlikely to matter very much, unless you have lots of classes of this weird shape, but even then the cost would be per-class, not per-instance.

Oop-maps are built during class parsing, by the shared runtime code. The visitors that walk the "oop"-s for the particular object looks into those oop-maps to find the offsets for references, and that code is also the part of shared runtime. So, this overhead is independent of GC implementation.

Considerations for performance:

Oop-maps are chunked: the runs of adjacent reference fields would form a continuous oop-map block that would be visited pretty much like we would with continuous oop block in reference array.
The GC (marking) performance is dependent on the number of references it has to follow, and memory latency on dereferences would be the first-order effect. Note that in class A, we have to traverse more references.
The null-checks and array bounds checks would probably matter in class A case, if requested indices are not constant and array lengths are not known on critical code paths. In comparison, fields are bound statically, and their offsets are always known.

So, it probably makes little sense to ask about the difference in GC/runtime handling of separate fields vs arrays. Taking care of locality of reference quite probably gives a bigger bang for the buck. Which tips the scale to class B, with associated maintainability overheads -- as quite a few performance tricks do.

Recommended topics

Hot tags