Textblob - HTTPError: HTTP Error 429: Too Many Requests
Asked Answered
Z

2

12

I am having a dataframe of which one column has a list of strings at each row.

On average, each list has 150 words of about 6 characters each.

Each of the 700 rows of the dataframe is about a document and each string is a word of this document; so basically I have tokenised the words of the document.

I want to detect the language of each of these documents and to do this I firstly try to detect the language of each word of the document.

For this reason I do the following:

from textblob import TextBlob

def lang_detect(document):

    lang_count = {}
    for word in document:

        if len(word) >= 4:

            word_textblob = TextBlob(word)
            lang_result = word_textblob.detect_language()

            response = lang_count.get(lang_result)

            if response is None:  
                lang_count[f"{lang_result}"] = 1
            else:
                lang_count[f"{lang_result}"] += 1

    return lang_count

df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))

When I do this then I get the following error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-42-772df3809bcb> in <module>
     25 
---> 27 df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))
     28 
     29 
.
.
.

    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 429: Too Many Requests

The error is much longer and I have omitted the rest of it at the middle.

Now,I am getting the same error even if I try to do this for only two documents/rows.

Is there any way that I can get a response from textblob for more words & documents?

Zaria answered 17/5, 2019 at 15:0 Comment(6)
i'm confused as to where you are making http requestsEndocranium
@SuperStew, I do not explicitly make any HTTP request but textblob must do this at the line lang_result = word_textblob.detect_language(). See also its docs here: textblob.readthedocs.io/en/dev/_modules/textblob/….Zaria
detect_language() uses the Google Translate API. You're being rate-limited for calling it too much in succession. textblob.readthedocs.io/en/dev/_modules/textblob/blob.html It looks like you're making a separate call for each word in the document, which looks extremely resource-intensive. Maybe you can reduce that?Obovoid
@ChristophBurschka, thank you for your reply; yes I know that it uses the Google Translate API but I did not know that textblob had any limitation with regards to that because it does not explicitly mention something like that. I do call it for each word separately but for my application I think that this is the best way to go.Zaria
You might save some requests by getting unique words and reusing the results - but it still seems like a lot of load. Another workaround would be to insert a throttling mechanism - sleep for a short time after sending a bunch of requests.Obovoid
@ChristophBurschka, thank you again for your reply. The unique words idea is not a bad idea but it even after this the set of words will be big and also it is a bit trickier to determine which words are unique. I did that with the sleep; 5secs sleep per document but I still received the same message after 4 documents (out of the 700 documents). It seems that I will simply pay to use the Google Translate API to have a big training set of language detected words & documents.Zaria
K
4

I had the same issue when I was trying to translate tweets. Since I exceed the rate limit, it started to return HTTP 429 too many requests error.

Therefore, for the others who might want to work on TextBlob, it would be better to check rate limits. Google provides information regarding limits: https://cloud.google.com/translate/quotas?hl=en

If you exceed the rate limits, you have to wait until quotas reset at midnight Pacific Time. It might take 24 hours to become effective again.

On the other hand, you can also introduce a delay between your requests to not bother the API server.

Ex: When you want to translate the TextBlob sentences in the list.

import time
...
for sentence in list_of_sentences:
    sentence.translate()
    time.sleep(1) #to sleep 1 sec
Koroseal answered 22/5, 2020 at 16:35 Comment(0)
P
2

You can try Googletrans.

"Googletrans is a free and unlimited Python library that implemented Google Translate API. This uses the Google Translate Ajax API to make calls to such methods as detect and translate."

Similary to TextBlob, Googletrans has features like language detection and translation. It worked pretty well for me when I was flagging the language and translating a large amount of mails.

(When using TextBlob I've tried time.sleep(1) but I ended up reaching the API limit...)

Passe answered 21/6, 2020 at 14:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.