I am looking for a way to deserialize a String
from a byte[]
in Java with as little garbage produced as possible. Because I am creating my own serializer and de-serializer, I have complete freedom to implement any solution on the server-side (i.e. when serializing data), and on the client-side (i.e. when de-serializing data).
I have managed to efficiently serialize a String
without incurring any garbage overhead by iterating through the String's
chars (String.charAt(i)
) and converting each char
(16-bit value) to 2x 8-bit value. There is a nice debate regarding this here. An alternative is to use Reflection to access String's
underlying char[]
directly, but this in outside the scope of the problem.
However, it seems impossible for me to deserialize the byte[]
without creating the char[]
twice, which seems, well, weird.
The procedure:
- Create
char[]
- Iterate through
byte[]
and fill-in thechar[]
- Create String with
String(char[])
constructor
Because of Java's String
immutability rules, the constructor copies the char[], creating 2x GC overhead. I can always use mechanisms to circumvent this (Unsafe String
allocation + Reflection to set the char[]
instance), but I just wanted to ask if there are any consequences to this other than me breaking every convention on String's
immutability.
Of course, the wisest response to this would be "come on, stop doing this and have trust in GC, the original char[]
will be extremely short-lived and G1 will get rid of it momentarily", which actually makes sense, if the char[]
is smaller than 1/2 of the G1's region size. If it is larger, the char[] will be directly allocated as a humongous object (i.e. automatically propagated outside of the G1's region). Such objects are extremely hard to be efficiently garbage collected in G1. That's why each allocation matters.
Any ideas on how to tackle the issue?
Many thanks.
MutableString
, and implement a lot of traditionally garbage-heavy opertations over it (fastpathString
split, for instance), and then have a methodtoString(from, to)
which creates a "view" instance which is of typeString
. I could do that. But this would require to completely refactor our application and to useMutableString
s everywhere possible. It is a nice idea, but I wanted to explore alternatives first. – ConiahCharBuffer
andStringBuilder
, both being a kind of mutableString
(unless you’ve created an immutable view), there are methods for creating lightweight subsequences of them and they all implementCharSequence
, theinterface
on which the regex package, which actually implements thesplit
operation, works on. And while it looks like character contents is copied all the time when converting betweenString
s,CharBuffer
s andStringBuilder
s when looking at the source code, HotSpot has special optimizations for them… – Brod