google translate api does not return apostrophe as apostrophe in python
Asked Answered
S

1

7

I am trying to use google translate api as below. Translation seems ok except the apostrophe chars which are returned as ' instaead.

Is it possible to fix those ? I can of course make a postprocessing but I don't know if there is another special character facing with same problem or not.

This is how I perform translation right now:

import pandas as pd
import six
from google.cloud import translate
# Instantiates a client
#translate_client = translate.Client()
"""Translates text into the target language.

Target must be an ISO 639-1 language code.
See https://g.co/cloud/translate/v2/translate-reference#supported_languages
"""
translate_client_en_de = translate.Client(target_language="de")
translate_client_de_en = translate.Client(target_language="en")

target1="de"
target2="en"
#if isinstance(text, six.binary_type):
#    text = text.decode('utf-8')
fname ='fname.tsv'
df = pd.read_table(fname,sep='\t')

for i,row in df.iterrows():
    text =  row['Text']
    de1 = translate_client_en_de.translate(
        text, target_language=target1)
    text2 = de1['translatedText']
    en2 = translate_client_de_en.translate(
        text2, target_language=target2)
    text3 = en2['translatedText']
    print(text)
    print(text2)
    print(text3)
    print('----------')
    break

Sample output:

Simon's advice after he wouldn't

Simon's advice after

Savanna answered 13/6, 2019 at 7:56 Comment(2)
This might help: #2087870Grocer
Possible duplicate of Google Translate API outputs HTML entitiesToxophilite
C
8

I solve it as follows:

Problem:

The problem is that you need to specify that you are using plain text and not HTML text. Look at the documentation here: https://googleapis.dev/python/translation/latest/client.html, look for the 'translate' attribute and the 'format_' parameter.

Solution:

Just add the parameter 'format_='text'. In my case I wrote it like this:

result = translate_client.translate(text, target_language=target, format_='text')

and it works well, now the api returns the apostrophe correctly:

Before I got: 'Hello, we haven't seen each other in a long time'.

Now I get: 'Hello, we haven't seen each other in a long time'

Coroner answered 5/8, 2020 at 19:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.