Why can't class attributes be named as reserved words in python?
Asked Answered
R

2

5

It seems reserved words can not be used as attributes in python:

$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
>>>     global = 3
  File "<stdin>", line 2
    global = 3
           ^
SyntaxError: invalid syntax

This seems sensible, since it is ambiguous: am I using the global keyword here? Difficult to say.

But this is not sensible imho:

>>> class A: pass
>>> a = A()
>>> a.global = 3
  File "<stdin>", line 1
    a.global = 3
           ^
SyntaxError: invalid syntax
>>> a.def = 4
  File "<stdin>", line 1
    a.def = 4
        ^
SyntaxError: invalid syntax
>>> a.super = 5
>>> a.abs = 3
>>> a.set = 5
>>> a.False = 5
  File "<stdin>", line 1
    a.False = 5
          ^
SyntaxError: invalid syntax
>>> a.break = 5
  File "<stdin>", line 1
    a.break = 5
          ^
SyntaxError: invalid syntax

Why this limitation? I am not using the reserved words in isolation, but as a class attribute: there is not ambiguity at all. Why would python care about that?

Rosenblum answered 25/10, 2017 at 6:43 Comment(13)
Because that's the point of "reserved" words, i.e. the ones you can't use.Neddy
Because they're reserved words.Oona
Allowing this complicates parsing for both the computer and a human reader. It's not worth it.Veldaveleda
BTW. This is not specific to Python at all. Virtually every language has reserved words.Neddy
a.global does not use any reserved word: this is an attribute which happens to be called the same as a reserved word. The attribute name will never be used in isolation, eliminating any cause of conflict. It will either be used as a.global or as self.global. There is no possible confusion with the global keyword at all. I understand that they can not be declared as class attributes, because that requires special syntax support and that's where a conflict arises. But why not as object attributes?Rosenblum
@el.pescado: Eh, it's language-dependent. For example, Javascript is perfectly fine with letting you say x.for = 3. Just how reserved your reserved words are is a matter of language design; it's frequently possible to allow keywords as ordinary identifiers in all sorts of specific contexts, but whether it's actually a good idea is another story.Veldaveleda
BTW, this was brought to my attention while processing yaml (or json) data to be accessible as regular object attributes (with a nested object hierarchy). The person writing the yaml (completely python unrelated) is constrained in that case by some python rules.Rosenblum
@DanielGonzalez But that file (yaml or json) has to have some defined structure. Anyway, object attributes can have arbitrary names, try setattr/getattr. setattr(a, "global", 42) is perfectly allowed.Neddy
@el.pescado defeating the purpose of my yaml-to-python-object parser, which is to simplify access to the complex yaml structure: instead of data['something']['another'] I want to do data.something.anotherRosenblum
Just add _ prefix or suffix to the parameter, if you really want to use that word.Vaticinate
@Vinny sure, and I can also call it gl0bal (as has been suggested to me) or ` glbal` (as I am doing at the moment). Not nice. And anyway I wanted to know the reason behind it. I thought there was a technical reason which escaped me, but it seems to be a matter of taste.Rosenblum
@DanielGonzalez but still, that yaml file has to conform to some standard. And even if that standard defines global as an attribute, you can resort to getattr.Neddy
BTW. Answer to question "why?" can be found here.Neddy
V
14

It's nowhere near worth it.

Sure, you could allow it. Hack up the tokenizer and the parser so the tokenizer is aware of the parse context and emits NAME tokens instead of keyword tokens when the parser is expecting an attribute access, or just have it always emit NAME tokens instead of keywords after a DOT. But what would that get you?

You'd make the parser and tokenizer more complicated, and thus more bug-prone. You'd make things harder to read for a human reader. You'd restrict future syntax possibilities. You'd cause confusion when

Foo.for = 3

parses and

class Foo:
    for = 3

throws a SyntaxError. You'd make Python less consistent, harder to learn, and harder to understand.

And for all that, you'd gain... the ability to write x.for = 3. The best I can say for this is that it'd prevent something like x.fibble = 3 from breaking upon addition of a fibble keyword, but even then, all other uses of fibble would still break. Not worth it. If you want to use crazy attribute names, you have setattr and getattr.


Python does its best to make the syntax simple. Its parser is LL(1), and the restrictions of an LL(1) parser are considered beneficial specifically because they prevent going overboard with crazy grammar rules:

Simple is better than complex. This idea extends to the parser. Restricting Python's grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.

Something like x.for = 3 is not in keeping with that design philosophy.

Veldaveleda answered 25/10, 2017 at 7:4 Comment(1)
OK, this explanation is quite satisfactory. I was under the impression that python was actively enforcing this, instead of being a direct consequence of the simple grammar rules. In other words, I thought that python was going overboard to prevent the use of reserved words in special contexts. Being the other way around (not going overboard to allow for them) seems sensible to me.Rosenblum
R
1

To understand the reasons behind this limitation, you need to understand how computer languages work.

Initially, you have a text file. You feed this text to a string tokenizer (called lexer) which recognizes the lexical elements such as words, operators, comments, numbers, strings and so on. Basically, the lexer is not aware of anything except characters. It converts a text file to a stream of typed tokens.

This stream of tokens is then fed into a parser. Parser deals with higher-level constructs, such as method definition, class definition, import statement etc. For example, parser knows that a function definition starts with "def", followed by some name (token of type identifier), then a colon, and a bunch of indented lines. This means some words such as "def", "return", "if" are reserved for parser, because they are part of language grammar.

The result of parsing is a data structure called abstract syntax tree (AST). AST corresponds directly to contents and structure of text file. In AST, there are no keywords, because they have already served their purpose. On the other hand, identifiers (names of variables and functions etc.) are retained because they are needed later by compiler/interpreter.

In short, keywords exist to give the text its structure. Without structure, it is impossible for a program to deterministically analyze the text. If you try to use a keyword for something else, it breaks the structure. After the structure is analyzed, they are no longer needed. Inherently, this means the author of a language has to draw a line and reserve some words for structure, while leaving all others free for the programmer to use.

This is not just Pyhthon-specific. It's the same for every language. If you didn't have text files, you wouldn't need keywords. Technically, it would be possible for a language to overcome this limitation, but it would complicate things a lot without any real benefit. Having a parser separate from the rest of language makes so much sense that you just wouldn't want it any other way.

Rigamarole answered 25/10, 2017 at 7:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.