Wrong encoding of google cloud translate and Java
Asked Answered
P

2

7

I'm trying to use Google cloud translate. I think the problem is that Google cloud translate use UTF8 and the jvm use UTF16. So i got some typo in translations. For instance :

      public static void main(String... args) throws Exception {
    // Instantiates a client
    Translate translate = TranslateOptions.getDefaultInstance().getService();

    // The text to translate
    String text = "Bonjour, à qui dois-je répondre? Non, C'est l'inverse...";

    // Translates some text into Russian
    Translation translation =
        translate.translate(
            text,
            TranslateOption.sourceLanguage("fr"),
            TranslateOption.targetLanguage("en"));


    System.out.printf("Text: %s%n", text);
    System.out.printf("Translation: %s%n", StringEscapeUtils.unescapeHtml(translation.getTranslatedText()));
  }

will return :

"Translation: Hello, who should I answer? No, it's the opposite ..."

instead of :

Translation: Hello, who should I answer? No, it's the opposite ...

We can't change the encoding of a java String, and the Google Cloud Api will not accept anything (Byte[]?) but String.

Do someone know how to fix it?

Thank you for reading

Edit : This code is now working, I added the StringEscapeUtils.unescapeHtml from commons.apache dependencies. I do not know if there is an other way to do it.

Pulse answered 15/2, 2018 at 11:7 Comment(0)
S
2

It's not a problem of UTF8 / UTF16.
The answer of google is html encoded.

https://en.wikipedia.org/wiki/Unicode_and_HTML

This is common if you want to transmit unicode character using only ASCII in a xml/html context .

Siccative answered 15/2, 2018 at 11:16 Comment(0)
O
3

Even though you already found a solution to your problem, I do have another fix for your problem which does not require the use of an additional library.

The translate method returns a html encoded String by default as previously mentioned. But it can return a plain text String if the matching TranslateOption is given in the method call.

The method call will then look something like this.

    Translation translation = translate.translate(
            text,
            Translate.TranslateOption.sourceLanguage(from),
            Translate.TranslateOption.targetLanguage(to),
            Translate.TranslateOption.format("text")
    );
Ott answered 10/5, 2019 at 7:5 Comment(1)
For the v3 API, the mime type can be set when building a TranslateTextRequest, i.e. TranslateTextRequest.newBuilder().setMimeType("text/plain")Andesine
S
2

It's not a problem of UTF8 / UTF16.
The answer of google is html encoded.

https://en.wikipedia.org/wiki/Unicode_and_HTML

This is common if you want to transmit unicode character using only ASCII in a xml/html context .

Siccative answered 15/2, 2018 at 11:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.