Non-ASCII Python identifiers and reflectivity [duplicate]

About

Asked 2/1, 2018 at 14:51 Answered 2/1, 2018 at 15:10

Solved python unicode reflection identifier variable-names

I have learnt from PEP 3131 that non-ASCII identifiers were supported in Python, though it's not considered best practice.

However, I get this strange behaviour, where my 𝜏 identifier (U+1D70F) seems to be automatically converted to τ (U+03C4).

class Base(object):
    def __init__(self):
        self.𝜏 = 5 # defined with U+1D70F

a = Base()
print(a.𝜏)     # 5             # (U+1D70F)
print(a.τ)     # 5 as well     # (U+03C4) ? another way to access it?
d = a.__dict__ # {'τ':  5}     # (U+03C4) ? seems converted
print(d['τ'])  # 5             # (U+03C4) ? consistent with the conversion
print(d['𝜏'])  # KeyError: '𝜏' # (U+1D70F) ?! unexpected!

Is that expected behaviour? Why does this silent conversion occur? Does it have anything to see with NFKC normalization? I thought this was only for canonically ordering Unicode character sequences...

Ortego answered 2/1, 2018 at 14:51 Comment(5)

Does defining an encoding make a difference? 03C4 is definitely the decomposition of 1D70F, and it looks from the reference like some normalization happens. – Aware 2/1, 2018 at 15:3

Your theory seems to be correct. Seems that python interpreter normalises your unicode variable already when assigning it. If you put print(dir(a)) after a has been assigned, you can see there is no trace of U+1D70F character in the class. Your second print statement would then work for the same reason (gets normalised), while your dictionary access fails as dictionaries can take anything as keywords and there would be no reason to normalise or do anything else to them as it is a string in parentheses. – Limbic 2/1, 2018 at 15:6

@Aware Nope, defining # -*- coding: utf-8 -*- makes no difference. Maybe NFKC is responsible.. but I thought canonisation was just about reordering, not changing the actual character.. 8) – Ortego 2/1, 2018 at 15:13

@Limbic I guess you're right as well.. but it leads to a quite unexpected behaviour when it comes to indexing __dict__, don't you find? – Ortego 2/1, 2018 at 15:14

Not at all. As the answer explains, there is no automatic normalisation of string literals, and It would be completely inappropriate to do so anyway. – Limbic 2/1, 2018 at 15:32

Per the documentation on identifiers:

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

You can see that U+03C4 is the appropriate result using unicodedata:

>>> import unicodedata
>>> unicodedata.normalize('NFKC', '𝜏')
'τ'

However, this conversion doesn't apply to string literals, like the one you're using as a dictionary key, hence it's looking for the unconverted character in a dictionary that only contains the converted character.

self.𝜏 = 5  # implicitly converted to "self.τ = 5"
a.𝜏  # implicitly converted to "a.τ"
d['𝜏']  # not converted

You can see similar problems with e.g. string literals used with getattr:

>>> getattr(a, '𝜏')
Traceback (most recent call last):
  File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute '𝜏'
>>> getattr(a, unicodedata.normalize('NFKD', '𝜏'))
5

Aware answered 2/1, 2018 at 15:10 Comment(7)

Well, that's interesting. Cheers :) I'll keep thinking that it's a weird behaviour anyway. If 𝜏 was the only character I could access on my keyboard, I couldn't use python reflective __dict__ or getattr features like anybody else.. Should I file this as a bug to python? – Ortego 2/1, 2018 at 15:17

@Ortego I'm not sure they'd consider it a bug, given that this is the documented behaviour. It certainly surprised me, though! And it makes dynamic attribute access (see the getattr example) a little more complex than initially expected. I guess this is why ASCII identifiers are still recommended; no more from math import pi as π for me! – Aware 2/1, 2018 at 15:18

I'll inform them anyway :) What's the best place to do so? – Ortego 2/1, 2018 at 15:20

@Ortego anything like that should go through bugs.python.org; have a look around, there may be a similar issue logged already. – Aware 2/1, 2018 at 15:20

Great. Here it is. Thanks again :) – Ortego 2/1, 2018 at 16:9

@Ortego I'd guess it'll get closed against e.g. bugs.python.org/issue13793 – Aware 2/1, 2018 at 16:13

Crab! Missed that one :\ You're right. – Ortego 2/1, 2018 at 16:16

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags