In python, why does 0xbin() return False? [duplicate]

Asked 25/7, 2018 at 14:14 Answered 26/7, 2018 at 2:24

195

Inputting the command 0xbin() returns False:

>>> 0xbin()
False

Why does that happen? This syntax should have no meaning whatsoever. Functions cannot start with 0, there are no "i" and "n" in hex, and the bin function must have some arguments.

2023 update: this is soon expected to be a syntax error.

Paleogeography answered 25/7, 2018 at 14:14 Comment(7)

It takes arguments! 0xbin(013,37) – Demilitarize 25/7, 2018 at 15:34

@Demilitarize and if you want to get True you can try 0xbin(11,) with a single argument – Paleogeography 25/7, 2018 at 16:9

0xbin(013,37) will also give you True ;) (in Python 2.7) – Demilitarize 26/7, 2018 at 2:25

This is simply because the implementor of the lexer and parser is only concerned with obtaining the desired behaviors over nicely formatted code. The juxtaposition of 0xb and in should be treated as an invalid token. – Grievous 26/7, 2018 at 17:20

Compare and contrast with 0xband(). Tokenizer is greedy and takes 0xba as a token. – Fluent 26/7, 2018 at 22:40

@people who are voting to reopen: Please explain why this is not a dupe. If you convince me, I'll dupehammer reopen it. – Bary 27/7, 2018 at 16:42

@Bary Maybe the dupe target is like "Why isn't whitespace required sometimes between tokens?" and this question is like "What even are the tokens?" (I could re-open myself, but since I answered and there were many votes I won't, in case it is viewed as a COI) – Samaria 7/8, 2018 at 9:42

231

Python seems to interpret 0xbin() as 0xb in (), meaning is eleven in an empty tuple. The answer is no, therefore False.

Americaamerican answered 25/7, 2018 at 14:18 Comment(10)

So apparently "in", "is" etc don't require spaces? First time I encountered this, but it makes sense as "<" and "==" don't require them as well. – Paleogeography 25/7, 2018 at 14:22

Apparently yes. The Python Reference says whitespace between tokens is only needed "if their concatenation could be interpreted as a different token". But I have only ever seen such code in Code Golf. – Americaamerican 25/7, 2018 at 14:29

@Paleogeography This is why valid python identifiers (and many other languages) only accept alpha or underscore for the first letter of the identifier then allow numeric afterwards. The actual implementation is fairly complicated because of full Unicode support, but the pure ASCII regex for an identifier would be: r'[_a-zA-Z][_a-zA-Z0-9]*' – Bluma 25/7, 2018 at 14:30

@Aaron: [_[:alpha:]][_[:alnum:]]* in regular expression languages that allow (Unicode) characters classes, i. e. not Python’s. ;-] – Clariceclarie 25/7, 2018 at 16:20

Wow, I thought this kind of parsing was only done in Fortran and BASIC. I can't believe a modern language does it. – Outlying 25/7, 2018 at 18:28

One of the reasons for the peculiar definition of a "preprocessing number" in the C language family is to prevent things like this; in those languages 0xbin would be treated as a single token, even though it cannot be interpreted as a valid numeric literal. – Glidebomb 25/7, 2018 at 23:44

@DavidFoerster You can just use regex instead of the built-in re. It provides the same API but adds a lot of features, including matching unicode properties etc. I hope that in a few releases that will replace the standard re module... (and I believe the author has this hope too, hence the high level of compatibility between the two). – Laissezfaire 26/7, 2018 at 7:57

@Outlying python is pretty old. – Hydrolysis 26/7, 2018 at 13:46

@Hydrolysis I've been programming for 40 years, Python is less than 30 years old. As far as I'm concerned, it's a young whipper-snapper. – Outlying 26/7, 2018 at 15:6

@Barmar: Python wants to remain an LL(1) language. That doesn't really have anything to do with this example in particular, but it illustrates their ongoing desire for a "dumb" parser (and also provides a little in the way of explanation. TL;DR: They don't want Python to end up like Perl.). – Bary 26/7, 2018 at 22:20

143

If you disassemble the code, you'll see that Yself's answer, which mentions that 0xbin() is interpreted as 0xb in (), is confirmed:

>>> import dis
>>> dis.dis('0xbin()')
  1           0 LOAD_CONST               0 (11)
              2 BUILD_TUPLE              0
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE

Samaria answered 25/7, 2018 at 14:19 Comment(1)

list(tokenize.tokenize(io.BytesIO(b"0xbin()").readline)) might be more appropriate – Surbase 25/7, 2018 at 16:38

You can use Python's own tokenizer to check!

import tokenize
import io
line = b'0xbin()'
print(' '.join(token.string for token in tokenize.tokenize(io.BytesIO(line).readline) if token.type!=59))

This prints the tokens in your string, separated by spaces. In this case, the result will be:

0xb in ( )

In other words, it returns False because the number 11 (0xb) is not in the empty tuple (()).

(Thanks to Roman Odaisky for suggesting the use of tokenize in the comments!)

EDIT: To explain the code a bit more thoroughly: the tokenize function expects input in a bit of a weird format, so io.BytesIO(line).readline is a function that turns a sequence of bytes into something tokenize can read. tokenize then tokenizes it and returns a series of namedtuples; we take the string representing each one and join them together with spaces. The type != 59 part is used to ignore the encoding specifier that would otherwise show up at the beginning.

Durazzo answered 26/7, 2018 at 2:24 Comment(1)

This is the best answer yet, the "dis" and "ast" answers obscure what is going on behind uncommon notations, this shows it clearly in normal python. – Candelaria 26/7, 2018 at 18:3

You can use the AST module to get the abstract syntax tree of the expression:

>>> import ast
>>> m = ast.parse('0xbin()')
>>> ast.dump(m)
'Module(
    body=[Expr(
               value=Compare(left=Num(n=11),
                             ops=[In()],
                             comparators=[Tuple(elts=[],
                                                ctx=Load())
                                         ]
                            ))])'

See the abstract grammar for how to interpret the expression, but tl;dr: Num(n=11) is the 0xb part, and Tuple(elts=[], ...) hints towards an empty tuple rather than a function call.

Administrative answered 25/7, 2018 at 17:57 Comment(0)

Recommended topics

Hot tags