Converting problem ANSI to UTF8 C#
Asked Answered
D

7

9

I have a problem with converting a text file from ANSI to UTF8 in c#. I try to display the results in a browser.

So I have a this text file with many accent character in it. Its encoded in ANSI, so I have to convert it to utf8 because in the browser instead of the accentchars appearing "?". No matter how I tried to convert to UTF8 it was still a "?". But if I convert the text file in notepad++ to utf8 then the accent chars are desplayed good.

here is a peace of encoding code that I made:

    public string Encode(string text)
    {
        // encode the string as an ASCII byte array
        byte[] myASCIIBytes = ASCIIEncoding.ASCII.GetBytes(text);

        // convert the ASCII byte array to a UTF-8 byte array
        byte[] myUTF8Bytes = ASCIIEncoding.Convert(ASCIIEncoding.ASCII, UTF8Encoding.UTF8, myASCIIBytes);

        // reconstitute a string from the UTF-8 byte array 
        return UTF8Encoding.UTF8.GetString(myUTF8Bytes);
    }

Do you have any idea why is this happening?

Dinorahdinosaur answered 23/9, 2010 at 12:16 Comment(1)
ASCII is the 7-bit encoding w/o code-page, as Andrey explains. If it has accent characters, you shouldn't be using ASCII.Marketing
T
18

Do you have any idea why is this happening?

Yes, you're too late. You need to specify ANSI when you read the string from file. In memory it's always Unicode (UTF16).

Tella answered 23/9, 2010 at 12:18 Comment(1)
+1 Yup, the text is already destroyed before it enters the function.Ewaewald
L
12

When you convert to ASCII you immediately lose all non-English characters (including ones with accent) because ASCII has only 127 (7 bits) of characters.

You do strange manipulation. string in .net is in UTF-16, so once you return string, not byte[] this doesn't matter.

I think you should do: (I guess by ANSI you mean Latin1)

public byte[] Encode(string text)
{
    return Encoding.GetEncoding(1252).GetBytes(text);
}

Since the question was not very clear there is a reasonable remark that you might actually need this one:

public string Decode(byte[] data)
{
    return Encoding.GetEncoding(1252).GetString(data);
}
Levinson answered 23/9, 2010 at 12:24 Comment(3)
+1 for the CodePage stuff, But I think you have the wrong direction here. The Op needs to read byte[] and a function to convert it to string.Tella
@Henk Holterman i have a feeling that i misunderstood the asker. but his function takes string and returns string so i am not sureLevinson
@Henk Holterman what is string Decode(byte[])? i don't know this method. GetByte return bytes for given encoding, what's wrong with it?Levinson
T
8

This is probably the easiest way:

byte[] ansiBytes = File.ReadAllBytes("inputfilename.txt");
var utf8String = Encoding.Default.GetString(ansiBytes);
File.WriteAllText("outputfilename.txt", utf8String);
Territerrible answered 15/10, 2012 at 11:37 Comment(1)
How does this work? Does GetString() detect which encoding was used in the input file? Or does it simply work because UTF-8 codepoints map correctly to the Latin1 codepage?Hoist
M
1

I would recommend to read this http://www.joelonsoftware.com/articles/Unicode.html.
If you are going to read a ASCII file you need to know the code page of the file.

Mayer answered 23/9, 2010 at 13:15 Comment(0)
M
0

My thoughts here is when you save the file in Notepad++ it inserts the Byte-Order-Mark so the browser can infer that it's UTF8 from this. Otherwise you'd probably have to explicitly tell the browser the character encoding, as in the DTD, in XML etc.

Mosaic answered 23/9, 2010 at 12:19 Comment(0)
B
0

This is probably happening because your original string text already contains invalid characters. Encoding conversion only makes sense if your input is a byte array. So, you should read the file as byte array instead of string, or, as Henk said, specify the encoding for reading the file.

Barnaby answered 23/9, 2010 at 12:22 Comment(0)
B
0

Also, you can try the following thing. I've changed the type by using notepad+ in the file.
(Encoding->Convert to UTF-8)
It works for me.

Ballance answered 30/11, 2020 at 8:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.