Python library for converting plain text (ASCII) into GSM 7-bit character set?
Asked Answered
P

5

9

Is there a python library for encoding ascii data to 7-bit GSM character set (for sending SMS)?

Pyemia answered 16/3, 2010 at 8:13 Comment(0)
M
18

There is now :)

Thanks to Chad for pointing out that this wasn't quite right

Python2 version

# -*- coding: utf8 -*- 
gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
       u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
       u"|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c)
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return res.encode('hex')

print gsm_encode(u"Hello World")

The output is hex. Obviously you can skip that if you want the binary stream

Python3 version

# -*- coding: utf8 -*- 
import binascii
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
       "¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
       "|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c);
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return binascii.b2a_hex(res.encode('utf-8'))

print(gsm_encode("Hello World"))
Mongrel answered 16/3, 2010 at 8:51 Comment(4)
An explantation and C# port can be found at https://mcmap.net/q/1171239/-convert-string-to-gsm-7-bit-using-c.Polyphemus
Wikipedia seems to think that the character between Ü and ¿ should be § (en.wikipedia.org/wiki/…).Pistol
I used this code to verify my gsm-7 encoding, which saved me a lot of work :-) (#37775709).Pistol
I have tried the Python 3 version of this and it did not seem to work. Is it really packing the 7-bit values correctly?Kristykristyn
C
3

I got tips from gnibbler's answer. Here is a script I somehow made up after looking at an online converter: http://smstools3.kekekasvi.com/topic.php?id=288, and it works correctly for me. Both encoding and decoding.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
   u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
   u"|````````````````````````````````````€``````````````````````````")

def get_encode(currentByte, index, bitRightCount, position, nextPosition, leftShiftCount, bytesLength, bytes):
    if index < 8:
        byte = currentByte >> bitRightCount
        if nextPosition < bytesLength:
            idx2 = bytes[nextPosition]
            byte = byte | ((idx2) << leftShiftCount)
            byte = byte & 0x000000FF
        else:
            byte = byte & 0x000000FF
        return chr(byte).encode('hex').upper()
    return ''

def getBytes(plaintext):
    if type(plaintext) != str:
         plaintext = str(plaintext)
    bytes = []
    for c in plaintext.decode('utf-8'):
        idx = gsm.find(c)
        if idx != -1:
            bytes.append(idx)
        else:
            idx = ext.find(c)
            if idx != -1:
                bytes.append(27)
                bytes.append(idx)
    return bytes

def gsm_encode(plaintext):
    res = ""
    f = -1
    t = 0
    bytes = getBytes(plaintext)
    bytesLength = len(bytes)
    for b in bytes:
        f = f+1
        t = (f%8)+1
        res += get_encode(b, t, t-1, f, f+1, 8-t, bytesLength, bytes)

    return res


def chunks(l, n):
    if n < 1:
        n = 1
    return [l[i:i + n] for i in range(0, len(l), n)]

def gsm_decode(codedtext):
    hexparts = chunks(codedtext, 2)
    number   = 0
    bitcount = 0
    output   = ''
    found_external = False
    for byte in hexparts:
    byte = int(byte, 16);
        # add data on to the end
        number = number + (byte << bitcount)
        # increase the counter
        bitcount = bitcount + 1
        # output the first 7 bits
        if number % 128 == 27:
             '''skip'''
             found_external = True
        else:
            if found_external == True:                
                 character = ext[number % 128]
                 found_external = False
            else:
                 character = gsm[number % 128]
            output = output + character

        # then throw them away
        number = number >> 7
        # every 7th letter you have an extra one in the buffer
        if bitcount == 7:
            if number % 128 == 27:
                '''skip'''
                found_external = True
            else:
                if found_external == True:                
                    character = ext[number % 128]
                    found_external = False
                else:
                    character = gsm[number % 128]
                output = output + character

            bitcount = 0
            number = 0
    return output
Christmas answered 13/12, 2014 at 2:18 Comment(0)
H
2

All the above solutions are not correct. A GSM 03.38 encoding is using only 7 bits for a character and all above solutions are using byte aligned output, which is identical to ASCII in most cases as the result. Here is a proper solution using a bit string.

I'm using the Python the additional module:

pip3 install gsm0338

gsmencode.py:

import sys

import gsm0338


def __create_septets__(octets: bytes) -> (bytes, int):
    num_bits = 0
    data = 0
    septets = bytearray()
    for i in range(len(octets)):
        gsm_char = octets[i]
        data |= (gsm_char << num_bits)
        num_bits += 7
        while num_bits >= 8:
            septets.append(data & 0xff)
            data >>= 8
            num_bits -= 8
    if num_bits > 0:
        septets.append(data & 0xff)
    return bytes(septets), len(octets) % 8


if __name__ == '__main__':
    octets = sys.argv[1].encode('gsm03.38')
    septets, sparse = __create_septets__(octets)
    print("sparse bits: %d" % sparse)
    print("encoded (hex): %s" % septets.hex())
python3 gsmencode.py Sample

Output:

sparse bits: 6
encoded (hex): d3701bce2e03
Haye answered 6/10, 2022 at 12:47 Comment(0)
F
0

I could not find any library. But I think this should not need a library. Its somewhat easy to do.

Here is Jon Skeet himself on the same topic.

Example:

s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def ascii_to_gsm(ch):
    return bin(65 + s.index(ch))

print ascii_to_gsm('A')
print '--'

binary_stream = ''.join([str(ascii_to_gsm(ch))[2:] for ch in s])
print binary_stream

You can also use dict to store mapping between ASCII and GSM 7-bit character set.

Franny answered 16/3, 2010 at 8:38 Comment(0)
S
0

I faced a similar issue recently where we were getting gsm7bit decoded text messages, mostly for Verizon carrier with Spanish characters, from the aggregator and we were not able to decode it successfully. Here is the one I created with the help of other answers in the forum. This is for Python 2.7.x.

def gsm7bitdecode(text):
    gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
           u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
    ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
           u"|````````````````````````````````````€``````````````````````````")

    text = ''.join(["{0:08b}".format(int(text[i:i+2], 16)) for i in range(0, len(text), 2)][::-1])

    text = [(int(text[::-1][i:i+7][::-1], 2)) for i in range(0, len(text), 7)]
    text = text[:len(text)-1] if text[-1] == 0 else text
    text =iter(text)

    result = []
    for i in text:
        if i == 27:
            i = next(text)
            result.append(ext[i])
        else:
            result.append(gsm[i])

    return "".join(result).rstrip()

Sunlight answered 23/8, 2019 at 5:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.