Java: StringBuffer to byte[] without toString
Asked Answered
A

6

13

The title says it all. Is there any way to convert from StringBuilder to byte[] without using a String in the middle?

The problem is that I'm managing REALLY large strings (millions of chars), and then I have a cycle that adds a char in the end and obtains the byte[]. The process of converting the StringBuffer to String makes this cycle veryyyy very very slow.

Is there any way to accomplish this? Thanks in advance!

Astraphobia answered 19/10, 2013 at 22:50 Comment(4)
The closest you can get is to get a char[] array. StringBuffer#getChars(int, int, char[], int)Mala
why not use CharBuffer instead? And then do "charBuffer.array()"?Hazan
Can you clarify why you need to store all these big strings in memory? Is this something a user is waiting on? Could this instead become a MapReduce or Spark job? I just wonder if maybe this question is a symptom of an architectural design smell.Celestinecelestite
Also, you say StringBuilder in the question but StringBuffer in the title. If you're going to do this, that will make a big difference. Go StringBuilder.Celestinecelestite
S
14

As many have already suggested, you can use the CharBuffer class, but allocating a new CharBuffer would only make your problem worse.

Instead, you can directly wrap your StringBuilder in a CharBuffer, since StringBuilder implements CharSequence:

Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder();

// No allocation performed, just wraps the StringBuilder.
CharBuffer buffer = CharBuffer.wrap(stringBuilder);

ByteBuffer bytes = encoder.encode(buffer);

EDIT: Duarte correctly points out that the CharsetEncoder.encode method may return a buffer whose backing array is larger than the actual data—meaning, its capacity is larger than its limit. It is necessary either to read from the ByteBuffer itself, or to read a byte array out of the ByteBuffer that is guaranteed to be the right size. In the latter case, there's no avoiding having two copies of the bytes in memory, albeit briefly:

ByteBuffer byteBuffer = encoder.encode(buffer);

byte[] array;
int arrayLen = byteBuffer.limit();
if (arrayLen == byteBuffer.capacity()) {
    array = byteBuffer.array();
} else {
    // This will place two copies of the byte sequence in memory,
    // until byteBuffer gets garbage-collected (which should happen
    // pretty quickly once the reference to it is null'd).

    array = new byte[arrayLen];
    byteBuffer.get(array);
}

byteBuffer = null;
Sheathbill answered 19/10, 2013 at 23:30 Comment(2)
+1 for the correct answer that also correctly implements charset encoding.Kittrell
Careful: ByteBuffer.array() returns the entire backing array, which will likely contain extra bytes!Hardesty
D
2

If you're willing to replace the StringBuilder with something else, yet another possibility would be a Writer backed by a ByteArrayOutputStream:

ByteArrayOutputStream bout = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bout);
try {
    writer.write("String A");
    writer.write("String B");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

try {
    writer.write("String C");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

As always, your mileage may vary.

Dilley answered 22/9, 2016 at 18:6 Comment(0)
E
1

Unfortunately, the answers above that deal with ByteBuffer's array() method are a bit buggy... The trouble is that the allocated byte[] is likely to be bigger than what you'd expect. Thus, there will be trailing NULL bytes that are hard to get rid off, since you can't "re-size" arrays in Java.

Here is an article that explains this in more detail: http://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/

Elver answered 19/10, 2013 at 22:50 Comment(0)
E
1

For starters, you should probably be using StringBuilder, since StringBuffer has synchronization overhead that's usually unnecessary.

Unfortunately, there's no way to go directly to bytes, but you can copy the chars into an array or iterate from 0 to length() and read each charAt().

Eryn answered 19/10, 2013 at 22:57 Comment(1)
+1 And the Javadoc for StringBuffer says you should use StringBuilder for nearly ten years now.Panthia
H
0

What are you trying to accomplish with "million of chars"? Are these logs that need to be parsed? Can you read it as just bytes and stick to a ByteBuffer? Then you can do:

buffer.array()

to get a byte[]

Depends on what it is you are doing, you can also use just a char[] or a CharBuffer:

CharBuffer cb = CharBuffer.allocate(4242);
cb.put("Depends on what it is you need to do");
... 

Then you can get a char[] as:

cp.array()

It's always good to REPL things out, it's fun and proves the point. Java REPL is not something we are accustomed to, but hey, there is Clojure to save the day which speaks Java fluently:

user=> (import java.nio.CharBuffer)
java.nio.CharBuffer

user=> (def cb (CharBuffer/allocate 4242))
#'user/cb

user=> (-> (.put cb "There Be") (.array))
#<char[] [C@206564e9>

user=> (-> (.put cb " Dragons") (.array) (String.))
"There Be Dragons"
Hazan answered 19/10, 2013 at 23:5 Comment(0)
P
0

If you want performance, I wouldn't use StringBuilder or create a byte[]. Instead you can write progressively to the stream which will take the data in the first place. If you can't do that, you can copy the data from the StringBuilder to the Writer, but it much faster to not create the StringBuilder in the first place.

Panthia answered 19/10, 2013 at 23:45 Comment(2)
How would we go about writing progressively to the stream? I have a function taking in byte[]Prompter
You need as function you can call with the byte[] you have read so far e.g. docs.oracle.com/javase/7/docs/api/java/io/… This function allows you to use the same byte[] each time thus making the memory consumption and garbage constant regardless of the size of data processed.Panthia

© 2022 - 2024 — McMap. All rights reserved.