How to convert array of tamil unicode values into tamil string in python with whitespaces?
Asked Answered
P

1

7

Here is the list of Tamil unicode codepoints

[u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a', u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2', u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd', u'\u0bb1', u'\u0bc1]

How can I convert it to readable string?

Parr answered 17/3, 2012 at 5:44 Comment(6)
Those are already Tamil letters. Try again.Dyanna
I see that you've changed your question; you now want to display your characters "with whitespaces" -- which whitespace character(s)? how many? positioned where? Try giving an example.Uzia
sir i want to the unicode of tamil to be printed as it is in arrray with whitespaces.i donot want to join the content in array and displayed in tamil charactersParr
actually i want to tokenize tamilwords.in order to tokenize it should be converted to utf-8 unicode to read the file.after reading it should be tokenized.result will be in unicode. i want result of unicode to be converted to tamil letters. but i didnt get white spaces when tokenize tamil wordsParr
i need spaces for each word but not each characters when tokenizing tamil wordsParr
@siva: You should really ask your REAL question the FIRST time up ... Edit your question. You will need to show your input and your tokenising code -- we are not mind-readers.Uzia
U
10

No conversion needed.

    >>> alist = [
            u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a',
            u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2',
            u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd',
            u'\u0bb1', u'\u0bc1',
            ]
    >>> print u''.join(alist)
    சட்டசபையில்நேற்று
    >>> 

Update: Perhaps you want this:

>>> print u' '.join(alist)
ச ட ் ட ச ப ை ய ி ல ் ந ே ற ் ற ு

or this:

>>> import unicodedata
>>> for c in alist:
    print repr(c), c, unicodedata.category(c)


u'\u0b9a' ச Lo
u'\u0b9f' ட Lo
u'\u0bcd' ் Mn
u'\u0b9f' ட Lo
u'\u0b9a' ச Lo
u'\u0baa' ப Lo
u'\u0bc8' ை Mc
u'\u0baf' ய Lo
u'\u0bbf' ி Mc
u'\u0bb2' ல Lo
u'\u0bcd' ் Mn
u'\u0ba8' ந Lo
u'\u0bc7' ே Mc
u'\u0bb1' ற Lo
u'\u0bcd' ் Mn
u'\u0bb1' ற Lo
u'\u0bc1' ு Mc
>>> 
Uzia answered 17/3, 2012 at 5:58 Comment(1)
Thank u sir for valuable replyParr

© 2022 - 2024 — McMap. All rights reserved.