How to read Swedish characters properly from a txt file
Asked Answered
R

3

11

I am reading a file (line by line) full of Swedish characters like äåö but how can I read and save the strings with Swedish characters. Here is my code and I am using UTF8 encoding:

TextReader tr = new StreamReader(@"c:\testfile.txt", System.Text.Encoding.UTF8, true);
tr.ReadLine() //returns a string but Swedish characters are not appearing correctly...
Runyan answered 13/12, 2012 at 0:39 Comment(11)
What's not working exactly?Baluster
the string being returned by tr.ReadLine() doesn't show Swedish characters.Runyan
C# strings are always UTF-16 encoded. By passing the UTF-8 encoding to your StreamReader reading a line returns a properly encoded UTF-16 string object. It now only depends on how you output your string...Elamite
It is showing like this: � � � � � �Runyan
do you have any suggestions about how should I format the output?Runyan
"It shows" is a little vague. The console? The file you write? Your web application?Elamite
It's a console app. The input file is a text file and I will be writing to a web application.Runyan
Take a look at this SO question: #388990 I think the command line is causing the problems...Elamite
No, I am not writing to a command line. I am saving the inputs into another web applicationRunyan
Its the code page associated. Take a look here at the MS site and get the correct code page and set it. msdn.microsoft.com/en-us/library/system.text.encoding.aspxBantustan
Can you help me figure out the correct code for Swedish language?Runyan
B
19

You need to change the System.Text.Encoding.UTF8 to System.Text.Encoding.GetEncoding(1252). See below

        System.IO.TextReader tr = new System.IO.StreamReader(@"c:\testfile.txt", System.Text.Encoding.GetEncoding(1252), true);
        tr.ReadLine(); //returns a string but Swedish characters are not appearing correctly
Bantustan answered 13/12, 2012 at 1:3 Comment(0)
R
1

I figured it out myself i.e System.Text.Encoding.Default will support Swedish characters.

TextReader tr = new StreamReader(@"c:\testfile.txt", System.Text.Encoding.Default, true);
Runyan answered 13/12, 2012 at 1:1 Comment(3)
Why did you set the input encoding to UTF-8 then?Elamite
@Runyan Yes if your default language is for the swedish language then yes it will work. If not see my post which gives you the code page for it.Bantustan
It has nothing to do with the language or character support, both UTF-8 and CP1252 support swedish language. It has to do with the file encoding, which is CP1252. You always have to know the encoding (rather than the language) of the file to read it properly.Chevron
S
0

System.Text.Encoding.UTF8 should be enough and it is supported both on .NET Framework and .NET Core https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?redirectedfrom=MSDN&view=netframework-4.8

If you still have issues with ��� characters (instead of having ÅÖÄ) then check the source file - what kind of encoding does it have? Maybe it's ANSI, then you have to convert to UTF8.

You can do it in Notepad++. You can open text file and go to Encoding - Convert to UTF-8.

Alternatively in the source code (C#):

var myString = Encoding.UTF8.GetString(File.ReadAllBytes(pathToTheTextFile));
Slander answered 17/7, 2019 at 8:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.