Encoding.GetEncoding(437).GetString() bug?
Asked Answered
V

2

8

I have following test program

char c = '§';
Debug.WriteLine("c: " + (int)c);

byte b = Encoding.GetEncoding(437).GetBytes("§")[0];
Debug.WriteLine("b: " + b);

char c1 = Encoding.GetEncoding(437).GetString(new byte[] { 21 })[0];
Debug.WriteLine("c1: " + (int)c1);

This produces following result:

c: 167
b: 21
c1: 21

As I can see here GetBytes is working correctly
167 in unicode => 21 in CP437
but GetString is not working
21 in CP437 => 21 in unicode

Is this a bug or my mistake?

Voluntaryism answered 8/8, 2011 at 15:5 Comment(7)
A long shot, but do both GetBytes and GetString return arrays with just a single element?Junkie
This is probably because 167 cannot be written in CP437 so its mapped to placeholder 21 (maybe ?) in CP437. the placeholder is mapped back to the placeholder in unicode which is 21 too.Parabolize
@Dani: Are you sure? That character does exist and is valid in CP437, which should use one byte to represent it. It's probably more than one byte in Unicode, but not in 437.. Check the wiki linked for char 21, which is that character.Junkie
@Kieren Johnstone - Yes GetBytes(...).Length = 1 and GetString(...).Length = 1Voluntaryism
Can you try displaying/printing the string returned from GetString, again out of interest? I don't know the inner workings of these methods but I agree it seems very odd.Junkie
on my machine its  (a square)Voluntaryism
Bonus ReadingDispatcher
H
7

CP437 is not "two-way" for characters in the range 0-31. As stated in the Wikipedia page you linked:

For many uses, the codes in the range 0 to 31 and the code 127 will not produce these symbols. Some (or all) of them will be interpreted as ASCII control characters.

Mapping an Unicode character to a supported CP437 character that is in this range works, but not the other way around. For example, take characters represented by bytes 13 and 10: chances are that if you got them inside a CP437 string, you actually want carriage return and line feed characters to be preserved, and not converted to a bullet and a music note. This behavior is normal: it's not a bug.

Hexagon answered 8/8, 2011 at 16:6 Comment(3)
Its what you could call a narrowing conversionLulalulea
OMG, hate to be beaten by such MS design decisions. Now I need to find/write two-way CP437 encoding. :-(Voluntaryism
@SeeR: The WinAPI function MultiByteToWideChar supports this through it's MB_USEGLYPHCHARS flag: "Use glyph characters instead of control characters.". But be warned that if your text had CRLF, they will come back as "♪◙" - because your file no longer has any CR+LF, but instead has Eighth Note+Inverse White CircleDispatcher
A
0

.net supports two different characters, both of which are (usually) rendered as §:

char c1 = (char)21;
char c2 = (char)167;

Console.WriteLine(c1 == c2);  // prints false
Console.WriteLine(c1);        // prints §
Console.WriteLine(c2);        // prints §

Character 21 is a special control character, which is rendered as § when output in text mode.

CP437 allows for 21 to be interpreted as either a control character or as the literal §. Apparently, GetString chooses to interpret it as the control character (which is a perfectly valid option), and, thus, maps it to the Unicode control character 21 rather than to the Unicode literal §.

Almandine answered 8/8, 2011 at 16:8 Comment(2)
In my machine Console.WriteLine(c1) prints EMPTY and Console.WriteLine(c2); prints §Hugmetight
I think so, I am using windows xp sp3 and Visual Studio 2010.Hugmetight

© 2022 - 2024 — McMap. All rights reserved.