Can you make Python3 give an error when comparing strings to bytes

Asked 30/5, 2020 at 9:57 Answered 1/6, 2020 at 9:47

When converting code from Python 2 to Python 3 one issue is that the behaviour when testing strings and bytes for equality has changed. For example:

foo = b'foo'
if foo == 'foo':
    print("They match!")

prints nothing on Python 3 and "They match!" on Python 2. In this case it is easy to spot but in many cases the check is performed on variables which may have been defined elsewhere so there is no obvious type information.

I would like to make the Python 3 interpreter give an error whenever there is an equality test between string and bytes rather than silently conclude that they are different. Is there any way to accomplish this?

Citarella answered 30/5, 2020 at 9:57 Comment(3)

Note that 1 == "one" also just gives False, while 1 == 1.0 gives True. Python doesn't consider comparing values of different types an error and the semantics of it depend on the types involved. – Nicolette 30/5, 2020 at 10:15

True, but crucially these examples give the same answer on both Python 2 and Python 3. If you have comparisons like these changing from using a v2 interpreter to a v3 interpreter should not silently change the behaviour of the control flow of your code. The same isn't true for the byte vs str comparisons. – Citarella 31/5, 2020 at 10:4

I guess what I was hoping was that there were some interpreter flags designed to help with conversion from 2 to 3 which modified the behaviour of the interpreter, presumably with some performance cost, to help identify these potential hidden problems that can easily be missed when updating. – Citarella 31/5, 2020 at 10:5

There is an option, -b, you can pass to the Python interpreter to cause it to emit a warning or error when comparing byte / str.

> python --help
usage: /bin/python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-b     : issue warnings about str(bytes_instance), str(bytearray_instance)
         and comparing bytes/bytearray with str. (-bb: issue errors)

This produces a BytesWarning as seen here:

> python -bb -i
Python 3.8.0
Type "help", "copyright", "credits" or "license" for more information.
>>> v1 = b'foo'
>>> v2 = 'foo'
>>> v1 == v2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string

Citarella answered 1/6, 2020 at 9:47 Comment(0)

(EDITED: to fix an issue where I was incorrectly suggesting that modifying __eq__ on the instance would affect the == evaluation as suggested by @user2357112supportsMonica).

Normally, you would do this by overriding the __eq__ method of the type(s) you would like to guard. Unfortunately for you, this cannot be done for built-in types, notably str and bytes, therefore code like this:

foo = b'foo'
bytes.__eq__ = ...  # a custom equal function
# str.__eq__ = ...  # if it were 'foo' == foo (or `type(foo)`)
if foo == 'foo':
    print("They match!")

would just throw:

AttributeError: 'bytes' object attribute '__eq__' is read-only

You may need to manually guard the comparison with something like:

def str_eq_bytes(x, y):
    if isinstance(x, str) and isinstance(y, bytes):
        raise TypeError("Comparison between `str` and `bytes` detected.")
    elif isinstance(x, bytes) and isinstance(y, str):
        raise TypeError("Comparison between `bytes` and `str` detected.")

to be used as follows:

foo = 'foo'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")
# They match!

foo = 'bar'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")
# <nothing gets printed>

foo = b'foo'
if str_eq_bytes(foo, 'foo') or foo == 'foo':
    print("They match!")

TypeError: Comparison between `bytes` and `str` detected.

The other option would be to hack in your own Python fork and override __eq__. Note that also Pypy does not allow you to override methods for built-in types.

Milieu answered 30/5, 2020 at 10:15 Comment(7)

That str_eq_bytes thing doesn't seem to be written to be used the way you're using it - it doesn't make sense to put it in an and. – Ician 30/5, 2020 at 10:21

Also assigning __eq__ on a bytes instance wouldn't help even if you could do it - you'd need to assign it on the type (which you also can't do). – Ician 30/5, 2020 at 10:23

@user2357112supportsMonica The and should really have been an or, thanks for spotting! As for the second comment, you could (if it were allowed) override it either on the class or on the instance, the difference being that you would "guard" all instances or only a specific one. – Milieu 30/5, 2020 at 10:28

No, assigning __eq__ on an instance has no effect on == checks. – Ician 30/5, 2020 at 11:22

@user2357112supportsMonica Yes, you are right, I have fixed it, thanks for spotting that too :-) Although, I am quite surprised of this behavior. Do you also know why is it so? – Milieu 30/5, 2020 at 11:35

Most magic methods only work when defined on an object's type. This is partly for speed, and partly to avoid problems like having repr(Dog) call Dog.__repr__ (which is meant to handle Dog instances). – Ician 30/5, 2020 at 11:58

When I started modifying the interpreter to add the code you suggested I found that there was already code in bytearrayobject to detect and warn about byte/str tests and that led me to the option I've described in my answer. I would not have found it without your suggestion, thanks. – Citarella 1/6, 2020 at 9:49

Recommended topics

Hot tags