Python3 Error: TypeError: Can't convert 'bytes' object to str implicitly
Asked Answered
N

3

44

I am working on exercise 41 in learnpythonthehardway and keep getting the error:

  Traceback (most recent call last):
  File ".\url.py", line 72, in <module>
    question, answer = convert(snippet, phrase)
  File ".\url.py", line 50, in convert
    result = result.replace("###", word, 1)
TypeError: Can't convert 'bytes' object to str implicitly

I am using python3 while the books uses python2, so I have made some changes. Here is the script:

#!/usr/bin/python
# Filename: urllib.py

import random
from random import shuffle
from urllib.request import urlopen
import sys

WORD_URL = "http://learncodethehardway.org/words.txt"
WORDS = []

PHRASES = {
            "class ###(###):":
                "Make a class named ### that is-a ###.",
            "class ###(object):\n\tdef __init__(self, ***)" :
                "class ### has-a __init__ that takes self and *** parameters.",
            "class ###(object):\n\tdef ***(self, @@@)":
                "class ### has-a funciton named *** that takes self and @@@ parameters.",
            "*** = ###()":
                "Set *** to an instance of class ###.",
            "***.*** = '***'":
                "From *** get the *** attribute and set it to '***'."
}

# do they want to drill phrases first
PHRASE_FIRST = False
if len(sys.argv) == 2 and sys.argv[1] == "english":
    PHRASE_FIRST = True

# load up the words from the website
for word in urlopen(WORD_URL).readlines():
    WORDS.append(word.strip())

def convert(snippet, phrase):
    class_names = [w.capitalize() for w in
                    random.sample(WORDS, snippet.count("###"))]
    other_names = random.sample(WORDS, snippet.count("***"))
    results = []
    param_names = []

    for i in range(0, snippet.count("@@@")):
        param_count = random.randint(1,3)
        param_names.append(', '.join(random.sample(WORDS, param_count)))

    for sentence in snippet, phrase:
        result = sentence[:]

        # fake class names
        for word in class_names:
            result = result.replace("###", word, 1)

        # fake other names
        for word in other_names:
            result = result.replace("***", word, 1)

        # fake parameter lists
        for word in param_names:
            result = result.replace("@@@", word, 1)

        results.append(result)

    return results

# keep going until they hit CTRL-D
try:
    while True:
        snippets = list(PHRASES.keys())
        random.shuffle(snippets)

        for snippet in snippets:
            phrase = PHRASES[snippet]
            question, answer = convert(snippet, phrase)
            if PHRASE_FIRST:
                question, answer = answer, question

            print(question)

            input("> ")
            print("ANSWER: {}\n\n".format(answer))
except EOFError:
    print("\nBye")

What exactly am I doing wrong here? Thanks!

Neveda answered 22/5, 2013 at 18:43 Comment(3)
As a side note, it's a really back idea to name a file urllib.py when you're importing from the urllib package. But that's not your problem here.Whitewash
As another side note, for word in urlopen(WORD_URL).readlines(): is silly; just do for word in urlopen(WORD_URL):. I'm assuming you got that from the tutorial you're following, which implies that the tutorial isn't just written for python2, but written for very old python2 (or at least by someone who's used to very old python2), so… you may want to find a newer tutorial if you want to learn how to write modern, idiomatic Python.Whitewash
I have found the explanation (and solution provided) in this link: mkyong.com/python/… very helpful.Rappee
W
33

urlopen() returns a bytes object, to perform string operations over it you should convert it to str first.

for word in urlopen(WORD_URL).readlines():
    WORDS.append(word.strip().decode('utf-8')) # utf-8 works in your case

To get the correct charset : How to download any(!) webpage with correct charset in python?

Winzler answered 22/5, 2013 at 18:53 Comment(1)
Except that the data isn't actually UTF-8. You get lucky here because it happens to be ASCII, which is a strict subset of UTF-8, but it's not good to assume you'll get so lucky everywhere.Whitewash
W
15

In Python 3, the urlopen function returns an HTTPResponse object, which acts like a binary file. So, when you do this:

for word in urlopen(WORD_URL).readlines():
    WORDS.append(word.strip())

… you end up with a bunch of bytes objects instead of str objects. So when you do this:

result = result.replace("###", word, 1)

… you end up trying to replace the string "###" within the string result with a bytes object, instead of a str. Hence the error:

TypeError: Can't convert 'bytes' object to str implicitly

The answer is to explicitly decode the words as soon as you get them. To do that, you have to figure out the right encoding from the HTTP headers. How do you do that?

In this case, I read the headers, I can tell that it's ASCII, and it's obviously a static page, so:

for word in urlopen(WORD_URL).readlines():
    WORDS.append(word.strip().decode('ascii'))

But in real life, you usually need to write code that reads the headers and dynamically figures it out. Or, better, install a higher-level library like requests, which does that for you automatically.

Whitewash answered 22/5, 2013 at 18:57 Comment(0)
M
-1

Explicitly convert byte type 'word' into string

result = result.replace("###", sre(word), 1)

it should work

Milagrosmilam answered 14/8, 2016 at 6:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.