Can you make Python3 give an error when comparing strings to bytes
Asked Answered
C

2

7

When converting code from Python 2 to Python 3 one issue is that the behaviour when testing strings and bytes for equality has changed. For example:

foo = b'foo'
if foo == 'foo':
    print("They match!")

prints nothing on Python 3 and "They match!" on Python 2. In this case it is easy to spot but in many cases the check is performed on variables which may have been defined elsewhere so there is no obvious type information.

I would like to make the Python 3 interpreter give an error whenever there is an equality test between string and bytes rather than silently conclude that they are different. Is there any way to accomplish this?

Citarella answered 30/5, 2020 at 9:57 Comment(3)
Note that 1 == "one" also just gives False, while 1 == 1.0 gives True. Python doesn't consider comparing values of different types an error and the semantics of it depend on the types involved.Nicolette
True, but crucially these examples give the same answer on both Python 2 and Python 3. If you have comparisons like these changing from using a v2 interpreter to a v3 interpreter should not silently change the behaviour of the control flow of your code. The same isn't true for the byte vs str comparisons.Citarella
I guess what I was hoping was that there were some interpreter flags designed to help with conversion from 2 to 3 which modified the behaviour of the interpreter, presumably with some performance cost, to help identify these potential hidden problems that can easily be missed when updating.Citarella
C
6

There is an option, -b, you can pass to the Python interpreter to cause it to emit a warning or error when comparing byte / str.

> python --help
usage: /bin/python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-b     : issue warnings about str(bytes_instance), str(bytearray_instance)
         and comparing bytes/bytearray with str. (-bb: issue errors)

This produces a BytesWarning as seen here:

> python -bb -i
Python 3.8.0
Type "help", "copyright", "credits" or "license" for more information.
>>> v1 = b'foo'
>>> v2 = 'foo'
>>> v1 == v2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string
Citarella answered 1/6, 2020 at 9:47 Comment(0)
M
5

(EDITED: to fix an issue where I was incorrectly suggesting that modifying __eq__ on the instance would affect the == evaluation as suggested by @user2357112supportsMonica).

Normally, you would do this by overriding the __eq__ method of the type(s) you would like to guard. Unfortunately for you, this cannot be done for built-in types, notably str and bytes, therefore code like this:

foo = b'foo'
bytes.__eq__ = ...  # a custom equal function
# str.__eq__ = ...  # if it were 'foo' == foo (or `type(foo)`)
if foo == 'foo':
    print("They match!")

would just throw:

AttributeError: 'bytes' object attribute '__eq__' is read-only

You may need to manually guard the comparison with something like:

def str_eq_bytes(x, y):
    if isinstance(x, str) and isinstance(y, bytes):
        raise TypeError("Comparison between `str` and `bytes` detected.")
    elif isinstance(x, bytes) and isinstance(y, str):
        raise TypeError("Comparison between `bytes` and `str` detected.")

to be used as follows:

foo = 'foo'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")
# They match!

foo = 'bar'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")
# <nothing gets printed>

foo = b'foo'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")
TypeError: Comparison between `bytes` and `str` detected.

The other option would be to hack in your own Python fork and override __eq__. Note that also Pypy does not allow you to override methods for built-in types.

Milieu answered 30/5, 2020 at 10:15 Comment(7)
That str_eq_bytes thing doesn't seem to be written to be used the way you're using it - it doesn't make sense to put it in an and.Ician
Also assigning __eq__ on a bytes instance wouldn't help even if you could do it - you'd need to assign it on the type (which you also can't do).Ician
@user2357112supportsMonica The and should really have been an or, thanks for spotting! As for the second comment, you could (if it were allowed) override it either on the class or on the instance, the difference being that you would "guard" all instances or only a specific one.Milieu
No, assigning __eq__ on an instance has no effect on == checks.Ician
@user2357112supportsMonica Yes, you are right, I have fixed it, thanks for spotting that too :-) Although, I am quite surprised of this behavior. Do you also know why is it so?Milieu
Most magic methods only work when defined on an object's type. This is partly for speed, and partly to avoid problems like having repr(Dog) call Dog.__repr__ (which is meant to handle Dog instances).Ician
When I started modifying the interpreter to add the code you suggested I found that there was already code in bytearrayobject to detect and warn about byte/str tests and that led me to the option I've described in my answer. I would not have found it without your suggestion, thanks.Citarella

© 2022 - 2024 — McMap. All rights reserved.