For a same String, will SQLite's length will ever return a different value than Java's length method?
Asked Answered
M

3

5

Give a same String data

  1. SQLite perform length calculation on its TEXT column.
  2. The TEXT column is read into (Using Android Room database) Java String, then Java performs String.length()

Is there any chance that these yields 2 different value?

I have do a rough test using English and non-English characters. Both yields the same value.

But, I am not sure whether there is any edge cases I have missed out?

Meletius answered 27/10, 2020 at 16:2 Comment(0)
L
5

There could be some cases where the length differ, Java uses UTF-16 for internal string representation, so some kind of characters will need a surrogate pair to be stored in memory. Java's String.length() does not take into account this.

A simple example using the πŸ’© emoji character

    class HelloWorld {
    public static void main(String[] args) {
        System.out.println("πŸ’©".length());
    }}

This will print 2.

On the other hand the documentation of sqlite states:

For a string value X, the length(X) function returns the number of characters (not bytes) in X prior to the first NUL character.

It specifies that it counts the characters

sqlite> select length('πŸ’©'); 

this will return 1.

This is not exclusive to "emojis" it will be the same also for some languages that have characters with "high" codepoints like some Asian characters

tested with sqlite 3.28.0 and openjdk version "1.8.0_252". I think it should hold true for your stack.

Loki answered 27/11, 2020 at 15:35 Comment(1)
Since ur mentioned case will more likely to happen than forpas's, I will award the points to you. Thank you. – Meletius
J
7

Since you are looking for edge cases...

From SQLite's Built-In Scalar SQL Functions:

length(X)
For a string value X,
the length(X) function returns the number of characters (not bytes) in X
prior to the first NUL character. (emphasis mine)
Since SQLite strings do not normally contain NUL characters,
the length(X) function will usually return the total number of characters in the string X....

So, SQLite, for:

SELECT LENGTH('a' || CHAR(0) || 'b')

will return 1,

but Java, for:

String s = "a" + Character.toString('\0') + "b";
System.out.println("" + s.length());

will return 3.

Jonis answered 23/11, 2020 at 15:13 Comment(1)
Thanks for the info. I notice SUBSTR will have the same side effect SELECT substr('abc' || CHAR(0) || 'def', 1, 5);. Seem like I need to rethink again... Thanks for the example. – Meletius
L
5

There could be some cases where the length differ, Java uses UTF-16 for internal string representation, so some kind of characters will need a surrogate pair to be stored in memory. Java's String.length() does not take into account this.

A simple example using the πŸ’© emoji character

    class HelloWorld {
    public static void main(String[] args) {
        System.out.println("πŸ’©".length());
    }}

This will print 2.

On the other hand the documentation of sqlite states:

For a string value X, the length(X) function returns the number of characters (not bytes) in X prior to the first NUL character.

It specifies that it counts the characters

sqlite> select length('πŸ’©'); 

this will return 1.

This is not exclusive to "emojis" it will be the same also for some languages that have characters with "high" codepoints like some Asian characters

tested with sqlite 3.28.0 and openjdk version "1.8.0_252". I think it should hold true for your stack.

Loki answered 27/11, 2020 at 15:35 Comment(1)
Since ur mentioned case will more likely to happen than forpas's, I will award the points to you. Thank you. – Meletius
K
2

Acording to the documentation of Sqlite NUL characters (ASCII 0x00, Unicode \u0000) in the TEXT field can lead to different length values.

Taking for example the text Hello\u0000World

Sqlite will return a length of 16

Java will return a length of 11

Java will count the NUL character as 1 while Sqlite will count 6. The same text will have different values.

Klug answered 26/11, 2020 at 8:46 Comment(6)
Is there anything in your answer that hasn't been already mentioned in my answer which was posted 2 days ago? – Jonis
Thats the most strange thing I have seen on SO, at the moment I wrote the answer, I didn't see ANY answer on this post, the OP had 1 upvote and any answer. @Jonis So, reading your answer, no, my answer don't add nothing new. – Klug
my answer don't add nothing new so is there any reason to keep it there? – Jonis
I took my time to perform a research and to write the answer, maybe someone would benefit of my simple example in the future, I don't think I overlap your answer and in any case copied it so, I will let the answer. Anyway, if you think my answer break some StackOverflow rule you can flag my answer and someone will review it. Regards @Jonis – Klug
I don't know if your answer breaks ant SO rule but this: ...reading your answer, no, my answer don't add nothing new... contradicts this: maybe someone would benefit of my simple example in the future – Jonis
This answer gives a bit different information. CHAR(0) and \u0000 have different results, which seems strange and interesting. Perhaps worthy of another question. – Adieu

© 2022 - 2024 β€” McMap. All rights reserved.