I am having a dataframe of which one column has a list of strings at each row.
On average, each list has 150 words of about 6 characters each.
Each of the 700 rows of the dataframe is about a document and each string is a word of this document; so basically I have tokenised the words of the document.
I want to detect the language of each of these documents and to do this I firstly try to detect the language of each word of the document.
For this reason I do the following:
from textblob import TextBlob
def lang_detect(document):
lang_count = {}
for word in document:
if len(word) >= 4:
word_textblob = TextBlob(word)
lang_result = word_textblob.detect_language()
response = lang_count.get(lang_result)
if response is None:
lang_count[f"{lang_result}"] = 1
else:
lang_count[f"{lang_result}"] += 1
return lang_count
df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))
When I do this then I get the following error:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-42-772df3809bcb> in <module>
25
---> 27 df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))
28
29
.
.
.
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 429: Too Many Requests
The error is much longer and I have omitted the rest of it at the middle.
Now,I am getting the same error even if I try to do this for only two documents/rows.
Is there any way that I can get a response from textblob
for more words & documents?
textblob
must do this at the linelang_result = word_textblob.detect_language()
. See also its docs here: textblob.readthedocs.io/en/dev/_modules/textblob/…. – Zariatextblob
had any limitation with regards to that because it does not explicitly mention something like that. I do call it for each word separately but for my application I think that this is the best way to go. – Zaria