Python 2 maketrans() function doesn't work with Unicode: "the arguments are different lengths" when they actually are - McMap

About

Python 2 maketrans() function doesn't work with Unicode: "the arguments are different lengths" when they actually are

Asked 7/5, 2015 at 18:23 Answered 7/5, 2015 at 18:25

Solved python string unicode python-2.x translate

D

1

9

[Python 2] SUB = string.maketrans("0123456789","₀₁₂₃₄₅₆₇₈₉")

this code produces the error:

ValueError: maketrans arguments must have same length

I am unsure why this occurs because the strings are the same length. My only idea is that the subscript text length is somehow different than standard size characters but I don't know how to get around this.

Deckard answered 7/5, 2015 at 18:23 Comment(3)

Works fine in Python 3 (which does have much nicer unicode support anyway), is that an option for you? – Assignat 7/5, 2015 at 18:51

currently I'm running python 2.7 but I will be sure to take a look at Python 3 – Deckard 7/5, 2015 at 19:40

That Python 3 code is from @ZeroPiraeus' neat answer to "Printing subscript in python" – Depicture 10/9, 2018 at 5:58

D

13

No, the arguments are not the same length:

>>> len("0123456789")
10
>>> len("₀₁₂₃₄₅₆₇₈₉")
30

You are trying to pass in encoded data; I used UTF-8 here, where each digit is encoded to 3 bytes each.

You cannot use str.translate() to map ASCII bytes to UTF-8 byte sequences. Decode your string to unicode and use the slightly different unicode.translate() method; it takes a dictionary instead:

nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}

This creates a dictionary mapping Unicode codepoints (integers), which you can then use on a Unicode string:

>>> nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}
>>> u'99 bottles of beer on the wall'.translate(nummap)
u'\u2089\u2089 bottles of beer on the wall'
>>> print u'99 bottles of beer on the wall'.translate(nummap)
₉₉ bottles of beer on the wall

You can then encode the output to UTF-8 again if you so wish.

From the method documentation:

For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.

Discrepancy answered 7/5, 2015 at 18:25 Comment(6)

is there any other way to get subscript characters in python? or even a way to over come this length difference – Deckard 7/5, 2015 at 18:27

Aaron: this would not be a limitation of Python ... but rather it's an implication of the differences between ASCII and Unicode. There are no "subscript characters" in ASCII. The implications of using Unicode characters is that Python cannot treat such characters as if they were ASCII --- any attempt to do so may work for some cases but will break for others. – Bombazine 7/5, 2015 at 18:51

@Martijn Where did you get 30? I get either 10 or "Unsupported characters in input", depending on where I try it. – Assignat 7/5, 2015 at 18:55

@StefanPochmann: using the interactive interpreter in a terminal configured for UTF-8 use. – Discrepancy 7/5, 2015 at 19:48

Only in Python 2. The length is 30 in Python 2 and 10 in Python 3. OP's code works fine in Python 3. – Depicture 10/9, 2018 at 5:55

@Depicture exactly; you’ll only see this specific error in Python 2 because these are byte strings. That's why the question is tagged with the python-2.x tag. – Discrepancy 10/9, 2018 at 8:34

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.