Is it possible to add data to a string after adding "\0" (null)?
Asked Answered
S

4

10

I have a string that I am creating, and I need to add multiple "\0" (null) characters to the string. Between each null character, is other text data (Just ASCII alphanumeric characters).

My problem is that in J2SE when you add the first null (\0), java then seems to determine that it's a string terminator, (similar to C++), and ignores all other data being appended. No error is raised, the trailing data is just ignored. I need to force the additional trailing data after a null in the string. I have to do this for a legacy database that I am supporting.

I have tried to encode/decode the string in hoping that something like %00 would fool the interpretation of the string behaviour, but when I re-encode the string, Java sees the null character again, and removes all data after the first null.

Update: Here is the relevant code snippet. Yes, I am trying to use Strings. I intend to try chars, but I still have to save it into the database as a string, so I suspect that I will end up with the same problem.

Some background. I am receiving data via HTTP post that has "\n". I need to remove the newlines and replace them with "\0". The "debug" method is just a simple method that does System.out.println.

                String[] arrLines = sValue.split("\n");
                for(int k=0;k<arrLines.length;k++) {
                    if (0<k) {
                        sNewValue += "\0";
                    }
                    sNewValue+= arrLines[k];
                    debug("New value =" + sNewValue);
                }

sNewValue, a String, is committed to the database and needs to be done as a String. What I am observing when i display the current value of sNewValue after each iteration in the console is something like this:

input is value1\nValue2\nValue3 Output in the console is giving me from this code

value1
value1
value1

I am expecting

value1
value1 value2
value1 value2 value3 

with non-printable null between value1, value2 and value3 respectively. Note that the value actually getting saved back into the database is also just "value1". So, it's not just a console display problem. The data after \0 is getting ignored.

Sepsis answered 24/12, 2011 at 9:12 Comment(9)
In java, you can never say "null character". Its a null value and Java doesn't use ASCII characters rather it uses unicode.Cubical
What are you intending to do with this String once you have it?Surrogate
@Lion: Untrue - character 0 in Unicode is known as the null character. See unicode.org/charts/PDF/U0000.pdfUmbilicus
@Jon Skeet:) Thanks for that info. Learnt something new.Cubical
(And this is why I prefer to call it the "NUL" character, although it is also known as the "null" character)Stile
I have updated the question with code..Sepsis
@giulio: So maybe System.out.println is truncating it - or rather, whatever that's actually printing to (e.g. an IDE). That doesn't mean the data isn't there...Umbilicus
@JonSkeet I thought that to. I have checked what has been written to the DB, and it's consistent with the console. I am not using Sun's JVM. I think this is the problem. See my comment in your answer. thnxSepsis
@giulio: Just looking at the DB isn't enough to diagnose what's going on. You should look at the length of the string, print out the Unicode value of each character etc.Umbilicus
U
16

I strongly suspect this is nothing to do with the text in the string itself - I suspect it's just how it's being displayed. For example, try this:

public class Test {
    public static void main(String[] args) {
        String first = "first";
        String second = "second";
        String third = "third";
        String text = first + "\0" + second + "\0" + third;
        System.out.println(text.length()); // Prints 18
    }
}

This prints 18, showing that all the characters are present. However, if you try to display text in a UI label, I wouldn't be surprised to see only first. (The same may be true in fairly weak debuggers.)

Likewise you should be able to use:

 char c = text.charAt(7);

And now c should be 'e' which is the second letter of "second".

Basically, I'd expect the core of Java not to care at all about the fact that it contains U+0000. It's just another character as far as Java is concerned. It's only at boundaries with native code (e.g. display) that it's likely to cause a problem.

If this doesn't help, please explain exactly what you've observed - what it is that makes you think the rest of the data isn't being appended.

EDIT: Another diagnostic approach is to print out the Unicode value of each character in the string:

for (int i = 0; i < text.length(); i++) {
    System.out.println((int) text.charAt(i));
}
Umbilicus answered 24/12, 2011 at 9:48 Comment(11)
Likewise, if you write it to the database with PreparedStatement.setString(), I'm not sure what the database will do. Maybe it's safer with setBytes() or setBlob().Vittle
@greyfairer: Not if the database field type is varchar or something similar, IMO.Umbilicus
The core of the problem appears to be the behaviour of the String class and what it does when it sees "\0".Sepsis
@giulio did you try to debug(sValue.length())? You'll see, it's not a String issue.Vittle
@giulio: I've seen no evidence for that, and shown a short but complete program showing which appears to demonstrate the opposite. Can you create a similar short but complete program which does demonstrate the problem?Umbilicus
@giulio String doesn't treat \0 as a special character. You can have any number \0 characters in a string. However, I have seen databases and GUI tools written in C truncate strings with a \0 in them.Vanegas
@JonSkeet. I wrote a quick little java app and you are correct. The \0 is displayed in the output, as a non-printable character, but it does display as expected. I have to use IBM's JVM <cringe>. I am starting to think it's IBM's JVM. As I still have the same problem in IBM's JVM. I am leaning towards the idea that it's IBM's crap JVM, and its treatment of Strings.. sigh!Sepsis
@giulio: Even in the IBM JVM, I strongly doubt that the behaviour of String is going to vary like that. Did you try the short example in my answer in the IBM JVM? What length does it print out? As I've said before, I'm almost certain the problem is in what you're displaying, not the data in the string itself.Umbilicus
@JonSkeet I appreciate that, I have checked the DB target, and it doesn't seem to be a console display issue. You have emboldened me to keep testing my code. I have made enough mistakes over the years to always suspect it's me, but this one is quite strange.Sepsis
@giulio: Checking just the console and the DB is a poor way of diagnosing the problem, as so many things can be interfering between your data and those debug outputs. See my edit at the bottom of the answer for a way of seeing the contents of the string accurately. I think you can rely on System.out.println working for integers :)Umbilicus
@JonSkeet I have checked the ASCII value in the String I am trying to save, and I am getting all the values in there as exepected, but the way the IBM JVM treats \0 is different to Sun's JVM as it won't display in the console under the IBM VM, but will display as a non-printable character in Sun's JVM. I also think now, that the database truncates the value when it sees \0. I think this is the end of the road on this problem as you've demonstrated how to test what's happening. With your help, I have confirmed that it is possible with Sun's JVM, it's just not possible with IBM's. +1. ThnxSepsis
D
2

I suggest you use a char[] or List<Char> instead since it sounds like you're not really using a String as such (a real String doesn't normally contain nulls or other unprintable characters).

Dotson answered 24/12, 2011 at 9:17 Comment(9)
-1: A "real string" can contain null characters with no problems. It's just another character as far as Java is concerned.Umbilicus
@Jon technically yes, but it's not a great ideaDotson
Why not? If the OP wants to represent such a string for a valid reaason, and there's no technical reason it shouldn't work, why avoid it? This sounds like a display issue more than anything else.Umbilicus
It's not really a sequence of characters. He's (ab)using String to hold a null-delimited sequence. He should probably create his own wrapper class.Dotson
Yes it is a sequence of characters - just not printable characters. Put it this way: if it were a comma-separated sequence, you wouldn't blink, right? So what's the big deal about using the perfectly-valid-but-unprintable U+0000 character as the separator?Umbilicus
@Jon I would use a comma if I had to write it out into a file for data interchange purposes; but I'm assuming this code is for some legacy API that reads null-separated byte streams. The rest of the app should be shielded from this (by using a List<char[]> or something similar), and the part of the app that actually interfaces should probably be using an OutputStream. I'm also guessing the app would break if it used an encoding that mapped the null character to something other than a single byte with value 0x00.Dotson
The rest of the app should almost certainly be using a List<String> to store a collection of strings, and I see no indication that that's not the case from the question. I don't see how char[] is any better here... this is more a collection of strings with a delimiter than an array of characters, isn't it?Umbilicus
I have to use JDK 1.4.2 (cringe).. generic's are not supported.Sepsis
@giulio: Then just List - the generic part is just a nicety really. You should only be using the concatenated string part where you have to - internally, you should keep collections as collections.Umbilicus
H
1

Same behavior for the StringBuffer class?

Since "\0" makes some trouble, I would recommend to not use it. I would try to replace some better delimiter with "\0" when actually writing the string to your DB.

Heikeheil answered 24/12, 2011 at 9:31 Comment(0)
C
0

This is because \ is an escape character in Java (as in many C-related languages) and you need to escape it using additional \ as follows.

String str="\\0Java language";
System.out.println(str);

and you should be able the display \0Java language on the console.

Cubical answered 24/12, 2011 at 9:32 Comment(2)
'\' isn't a regular expression. It's just an escape character in string literals.Umbilicus
The OP doesn't want backslash followed by a zero in the string - he wants the Unicode null character, U+0000.Umbilicus

© 2022 - 2024 — McMap. All rights reserved.