In python, why does 0xbin() return False? [duplicate]
Asked Answered
P

4

195

Inputting the command 0xbin() returns False:

>>> 0xbin()
False

Why does that happen? This syntax should have no meaning whatsoever. Functions cannot start with 0, there are no "i" and "n" in hex, and the bin function must have some arguments.

2023 update: this is soon expected to be a syntax error.

Paleogeography answered 25/7, 2018 at 14:14 Comment(7)
It takes arguments! 0xbin(013,37)Demilitarize
@Demilitarize and if you want to get True you can try 0xbin(11,) with a single argumentPaleogeography
0xbin(013,37) will also give you True ;) (in Python 2.7)Demilitarize
This is simply because the implementor of the lexer and parser is only concerned with obtaining the desired behaviors over nicely formatted code. The juxtaposition of 0xb and in should be treated as an invalid token.Grievous
Compare and contrast with 0xband(). Tokenizer is greedy and takes 0xba as a token.Fluent
@people who are voting to reopen: Please explain why this is not a dupe. If you convince me, I'll dupehammer reopen it.Bary
@Bary Maybe the dupe target is like "Why isn't whitespace required sometimes between tokens?" and this question is like "What even are the tokens?" (I could re-open myself, but since I answered and there were many votes I won't, in case it is viewed as a COI)Samaria
A
231

Python seems to interpret 0xbin() as 0xb in (), meaning is eleven in an empty tuple. The answer is no, therefore False.

Americaamerican answered 25/7, 2018 at 14:18 Comment(10)
So apparently "in", "is" etc don't require spaces? First time I encountered this, but it makes sense as "<" and "==" don't require them as well.Paleogeography
Apparently yes. The Python Reference says whitespace between tokens is only needed "if their concatenation could be interpreted as a different token". But I have only ever seen such code in Code Golf.Americaamerican
@Paleogeography This is why valid python identifiers (and many other languages) only accept alpha or underscore for the first letter of the identifier then allow numeric afterwards. The actual implementation is fairly complicated because of full Unicode support, but the pure ASCII regex for an identifier would be: r'[_a-zA-Z][_a-zA-Z0-9]*'Bluma
@Aaron: [_[:alpha:]][_[:alnum:]]* in regular expression languages that allow (Unicode) characters classes, i. e. not Python’s. ;-]Clariceclarie
Wow, I thought this kind of parsing was only done in Fortran and BASIC. I can't believe a modern language does it.Outlying
One of the reasons for the peculiar definition of a "preprocessing number" in the C language family is to prevent things like this; in those languages 0xbin would be treated as a single token, even though it cannot be interpreted as a valid numeric literal.Glidebomb
@DavidFoerster You can just use regex instead of the built-in re. It provides the same API but adds a lot of features, including matching unicode properties etc. I hope that in a few releases that will replace the standard re module... (and I believe the author has this hope too, hence the high level of compatibility between the two).Laissezfaire
@Outlying python is pretty old.Hydrolysis
@Hydrolysis I've been programming for 40 years, Python is less than 30 years old. As far as I'm concerned, it's a young whipper-snapper.Outlying
@Barmar: Python wants to remain an LL(1) language. That doesn't really have anything to do with this example in particular, but it illustrates their ongoing desire for a "dumb" parser (and also provides a little in the way of explanation. TL;DR: They don't want Python to end up like Perl.).Bary
S
143

If you disassemble the code, you'll see that Yself's answer, which mentions that 0xbin() is interpreted as 0xb in (), is confirmed:

>>> import dis
>>> dis.dis('0xbin()')
  1           0 LOAD_CONST               0 (11)
              2 BUILD_TUPLE              0
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE
Samaria answered 25/7, 2018 at 14:19 Comment(1)
list(tokenize.tokenize(io.BytesIO(b"0xbin()").readline)) might be more appropriateSurbase
D
64

You can use Python's own tokenizer to check!

import tokenize
import io
line = b'0xbin()'
print(' '.join(token.string for token in tokenize.tokenize(io.BytesIO(line).readline) if token.type!=59))

This prints the tokens in your string, separated by spaces. In this case, the result will be:

0xb in ( ) 

In other words, it returns False because the number 11 (0xb) is not in the empty tuple (()).

(Thanks to Roman Odaisky for suggesting the use of tokenize in the comments!)

EDIT: To explain the code a bit more thoroughly: the tokenize function expects input in a bit of a weird format, so io.BytesIO(line).readline is a function that turns a sequence of bytes into something tokenize can read. tokenize then tokenizes it and returns a series of namedtuples; we take the string representing each one and join them together with spaces. The type != 59 part is used to ignore the encoding specifier that would otherwise show up at the beginning.

Durazzo answered 26/7, 2018 at 2:24 Comment(1)
This is the best answer yet, the "dis" and "ast" answers obscure what is going on behind uncommon notations, this shows it clearly in normal python.Candelaria
A
54

You can use the AST module to get the abstract syntax tree of the expression:

>>> import ast
>>> m = ast.parse('0xbin()')
>>> ast.dump(m)
'Module(
    body=[Expr(
               value=Compare(left=Num(n=11),
                             ops=[In()],
                             comparators=[Tuple(elts=[],
                                                ctx=Load())
                                         ]
                            ))])'

See the abstract grammar for how to interpret the expression, but tl;dr: Num(n=11) is the 0xb part, and Tuple(elts=[], ...) hints towards an empty tuple rather than a function call.

Administrative answered 25/7, 2018 at 17:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.