I am working on a project in Unity which uses Assembly C#. I try to get special character such as é, but in the console it just displays a blank character: "". For instance translating "How are you?" Should return "Cómo Estás?", but it returns "Cmo Ests". I put the return string "Cmo Ests" in a character array and realized that it is a non-null blank character. I am using Encoding.UTF8, and when I do:
char ch = '\u00e9';
print (ch);
It will print "é". I have tried getting the bytes off of a given string using:
byte[] utf8bytes = System.Text.Encoding.UTF8.GetBytes(temp);
While translating "How are you?", it will return a byte string, but for the special characters such as é, I get the series of bytes 239, 191, 189, which is a replacement character.
What type of information do I need to retrieve from the characters in order to accurately determining what character it is? Do I need to do something with the information that Google gives me, or is it something else? I am need a general case that I can place in my program and will work for any input string. If anyone can help, it would be greatly appreciated.
Here is the code that is referenced:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using UnityEngine;
using System.Collections;
using System.Net;
using HtmlAgilityPack;
public class Dictionary{
string[] formatParams;
HtmlDocument doc;
string returnString;
char[] letters;
public char[] charString;
public Dictionary(){
formatParams = new string[2];
doc = new HtmlDocument();
returnString = "";
}
public string Translate(String input, String languagePair, Encoding encoding)
{
formatParams[0]= input;
formatParams[1]= languagePair;
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", formatParams);
string result = String.Empty;
using (WebClient webClient = new WebClient())
{
webClient.Encoding = encoding;
result = webClient.DownloadString(url);
}
doc.LoadHtml(result);
input = alter (input);
string temp = doc.DocumentNode.SelectSingleNode("//span[@title='"+input+"']").InnerText;
charString = temp.ToCharArray();
return temp;
}
// Use this for initialization
void Start () {
}
string alter(string inputString){
returnString = "";
letters = inputString.ToCharArray();
for(int i=0; i<inputString.Length;i++){
if(letters[i]=='\''){
returnString = returnString + "'";
}else{
returnString = returnString + letters[i];
}
}
return returnString;
}
}
print()
Method do? If you're trying to treat your UTF8 bytes as characters, you'll have problems. UTF8 characters can be more than 1 byte long. – Finance