Special characters in Android sms
Asked Answered
C

2

13

I've observed this issue for years now, not knowing where it came from. I am concerned that this bug is still observable in the new versions of Android, in 2011, and I hope you can finally help me to fully understand it, if not solve it.

Let's consider the given (real) situation. Mister "A" is using a custom SMS/MMS app from Sony on his Xperia Arc (official 2.3.3). Mister B is using the android SMS/MMS stack app on his Milestone (Cyanogen 6.12, unofficial 2.2). Both of them use Android in French (if that matters).

When A sends a sms to B containing special characters like "ç", "ê", B receives a message with these characters replaced by a space. Characters like "é" are working fine though. When B sends the sms to A, everything works fine. When A sends this sms to himself, everything works fine.

Conclusion : this is not the mobile provider's fault since it works in one way and not the other.

So, I guessed at first that something was wrong with A's custom app. Replaced it with the apk from B's phone. Everything remained the same. I decompiled the app and I didn't find where the encoding of the sms string was done. I concluded the bug is not coming from the app, but from the way Android encodes the strings...

I ran another test : I wrote an sms with only standard characters, something like 250 characters in 1.5 sms. Then, I append a "ç" to the sms. On A's phone : the counter says it consumed 10 characters. On B's phone : the counter says the sms now takes 3 sms : the string size doubled !

Conclusion : On A's phone, the default charset includes "ç". On B's phone, when "ç" appears, the charset changes and each character needs then twice the original space. (Or am I missing something ?)

Questions : Why different version of Android aren't using the same default charset ? On Android, are these default charset depending on the rom, for example ? Can we configure/change these charset somewhere (in the menu or directly on a rooted phone) ? Is there another easy way to fix this ?

Any help, explanation or experience is welcome :)

Coordination answered 1/8, 2011 at 20:9 Comment(0)
A
4

You are suffering from encoding problems. From the description it looks like 'A' is sending data in one charset and not including information about what charset that is. The root cause is that to pass extended (non-ascii) characters between two systems they have to agree on an encoding to use. If you are restricted to 8 bit values then the systems agree to use the same codepages. In SMS there is a special GSM codepage for 7 or 8 bit encodings or UTF-16 can be used which uses 2 bytes to represent each character. What you see when you enter 250 characters followed by a single extended character shows you what is happening in the application. An SMS message is restricted to 140 octets. When you are using an 8 bit encoding your 250 chars fit into 2 messages (250 < 280) however once you added the "ç" the app changed to using UTF-16 encoding so suddenly all your characters are taking 2 octets and you can only fit 70 characters into a message. Now it takes 3.5 SMS messages to transfer the entire message.

On Android the decoding of the SMS message is part of the framework telephony code in SmsCbMessage.java. It works out the language code and encoding of the message body. If this is incorrect (the message was encoded with an english codepage but uses french extended chars) then you can get odd characters appearing.

You are right that this is not the mobile network at fault. I suspect it is phone A's messaging application although it is possible that Android is failing to correctly identify the encoding of a valid SMS. I wonder how it works between A and an iPhone or some other manufacturers device.

Aboriginal answered 11/8, 2011 at 11:33 Comment(4)
Thank you for your answer. We ran the test with other phones, same result when A sends a message with special characters. So I guess there's a problem when A is encoding the sms. Are you sure this is done in the SMS/MMS app and not internally in Android framework ? How could we then explain the fact that changing the app on A's phone for the app coming from B's phone didn't fixed the issue ?Coordination
As these are both android devices you can actually examine the SMS message as it passes through the radio layer which is after it has been encoded. If you use 'adb logcat -b radio | tee radio.log' and then send your test message you should see something like the following:Aboriginal
E/RIL ( 133): smsc : E/RIL ( 133): strlen(pdu) = 114(0x39), pdu : 01000c91449732832356000b2c005400650073007400 200065006e0063006f00640069006e0067002000e700200061006e0064002000e9002e This is the utf-16 encoded version of 'Test encoding ç and é.' with some header bytes at the front. We can check this using python >>> x[26:].decode('hex').decode('utf-16be') u'Test encoding \xe7 and \xe9.' So with this you can check the output to the radio layer which should show it has been mis-encoded before it left the phone.Aboriginal
Oh, I didn't know about this radio log, that could be helpful. Unfortunately, adb logcat -b radio doesn't display what you tell. I get SMS SC Adress and new message received messages... Any idea of what i'm doing wrong ? Thanks again for your help !Coordination
K
0

I have encountered the same problem when I had to show a few special characters in an sms unicode app. The method I used was take the string that I need to send as sms, run it in a for loop to take each character , find its ascii code , use that integer value to encode that string using a delimiter. This string can be sent as sms, which needs to be decoded using the same delimiter that is used for sending, then convert each ascii code char in it to characters (language specific), form a string by appending the converted chars. This text will be same as the one that was sent as sms.

Regards

Kkt answered 28/12, 2011 at 10:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.