SMPP, SMS, GSM, Data Encodings, and Locking Shift Tables [closed]
Asked Answered
I

0

6

I'm working with a corporation which is attempting to send SMS messages to people in countries all over the world using various languages.

The corporation has a custom-written application which communicates using the SMPP protocol with SMSC's of various Telcos.

We have been told by different telcos which data_encoding to use for submitting SMPP PDU's to the SMSC.

Currently we are using 7-bit GSM), Latin-1, and UCS-2 encodings. We are using the encoding that each Telco has told us to use. The payload of the SMPP PDU is submitted encoded, and the data_coding parameter is set accordingly (0x00 for GSM, 0x03 for Latin-1, and 0x08 for UCS-2).

Question 1: Should it really matter what we encoding that we utilize for submitting SMPP PDU's to the SMSC? Shouldn't the SMSC be able to convert from the submitted SMPP encoding to the appropriate encoding based upon the contents of the data_coding parameter? Shouldn't we be able to submit all messages via SMPP as UCS-2 , set the data_coding parameter to 0x08, and have the Telco take care of the conversion to the SMS PDU for us?

Currently, we send want to send Portuguese language SMS messages. The telco has told us to use the "SMSC Default Alphabet" for SMPP to submit the messages. Pressed further, they said this was the same as the GSM default alphabet This is concerning as the Portuguese Alphabet isn't fully represented by the GSM Default Alphabet. It seems that the telco is simply transliterating the Portuguese letters to English equivalents. The telco informed us that "if you send a SMS with a special character that the SMSC does not recognize (á,ó,ã for instance) the SMSC will encode those characters to the closest character possible." I find this somewhat impossible since the GSM Default Alphabet doesn't support such characters in the first place.

Question 2: How can special characters be submitted, and then not be recognized if one uses the GSM Default Alphabet? Shouldn't all characters submitted as the GSM Default Alphabet conform to the 7-bit, 128 letter alphabet which is defined in the GSM 03.38 standard?

Question 3: Since the telco has requested that we use the "GSM Default Alphabet", we should submit our SMPP payload encoded as 7-bit packed octets, correct?

Our application stores text as UTF-8. Since the Portuguese telco is requesting that we submit SMPP with a payload containing the GSM Default Alphabet, I presume that we will need to convert from UTF-8 to the 7-bit GSM default alphabet. My current strategy involves mapping each UTF-8 character which has a GSM default equivalent (128 characters total) by value, and then transliterating other UTF-8 characters to the closest GSM default alphabet equivalent, and a question mark otherwise.

Question 4: Is this the appropriate way to handle conversion from UTF-8 to the GSM default alphabet? There don't seem to be many other approaches. The application in question uses Ruby in a Unix environment. No existing libraries supporting GSM seem to be available, so a custom library seems to be the only approach.

My research has uncovered details of the GSM locking shift tables to support other languages using only 7-bits. The locking shift tables are specified in the UDH portion of the SMS PDU.

Question 5: How would one send SMS messages using the locking shift tables via SMPP? Does the SMPP PDU payload need to be modified to contain a UDH which specifies the locking shift table? What should the data_coding parameter be set to?

I'd be thrilled if anyone could answer any of these questions authoritatively.

Imprecision answered 6/7, 2011 at 21:0 Comment(4)
There's too many questions here, and they're really offtopic for SO.Selfish
Yes, I realize that there are lots of questions, and that this is somewhat off-topic for SO. However, I figured I would try my luck as there is a good deal of programatic concepts and issues involved. Would you have a helpful suggestion as to where I could find help with this?Imprecision
I disagree, I don't think this is off topic and if you figured out any of these answers I'd love to hear from you. Why is this telephony stuff so opaque? Anyway, the SMPP 3.4 spec does not actually say anything about shift tables; in fact it doesn't even say data_coding 0 needs to be GSM 03.38, so it would be hard to imagine how the protocol would make room for specifying the shift table. As far as GSM -> string encoding/decoding, I've written a Python one I could open source and twitter has an open source one for Java.Denishadenison
I can answer Q1 the reason it can show a fallback character. If you look at the shift table in the case of the Euro the <escape>+e when the receiver doesn't understand the character it will simply show the e instead of the euro symbol which is a sensible fallback. The same applies for all languages. The shift tables are just super imposed on top of similar looking characters which I think is the whole point of this GSM encoding.Seitz

© 2022 - 2024 — McMap. All rights reserved.