How to convert Extended ASCII to decimal in Windows Forms C#?
Asked Answered
V

3

0

I am writing a windows application. am facing problem in converting Extended ASCII[128-256] to its decimal equivalent.

when i receive the extended ASCII say for example "Œ" from a jar file, it comes into C# application like this : �.

Can i know how to convert this to its decimal equivalent [i.e] 140.

string textToConvert = "Œ";
Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Unicode;
byte[] srcTextBytes = iso8859.GetBytes(textToConvert);
byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes);
char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)];
unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0);
System.String szchar = new System.String(destChars);

MessageBox.Show(szchar);

Please help me. How should i proceed??

Verticillaster answered 17/8, 2012 at 13:12 Comment(16)
Could you explain, what do yo umean by "decimal" equivalent? You mean the character code? Or do you want to display it properly like OE not [?] ?Achromic
The decimal value should be in your srcTextBytes byte array.Criticism
This problem starts with the iso-8859-1 encoding not having a character that matches "Œ". en.wikipedia.org/wiki/ISO_8859-1#Codepage_layout The result is GIGO.Sedulity
What you want is the Windows-1252 encoding.Criticism
@quetzalcoatl: Yes, Character code..Verticillaster
@VoidStar: there is no decimal value in srcTextBytes..Verticillaster
@HansPassant: ok... lets take "©". its character code is 169. How can i get.. any code please??Verticillaster
That's an entirely different question. Considering the number of comments on this one, it is probably best to ask a new question instead.Sedulity
@Verticillaster The codepoint of '©' (U+00A9) in the UCS is indeed 169 so you could just cast it to int. But if you expect 'Œ' (U+0152) to be 140 rather than 338 - it's code-point in the UCS, then you are not talking about the UCS nor about ISO-8859-1.Priscian
@HansPassant: I gave Œ as an example for explaining my problem. whatever extended code[128-256], i cant able to convert to its character code. kindly guide me please..Verticillaster
@JonHanna: thank you for making me clear.. byte[] result = Encoding.GetEncoding("Windows-1252").GetBytes(inputString); here i want the result[i.e:: 140] as char value. how can i convert?Verticillaster
I'm not sure you really do want that. "140 as a char value" doesn't make a lot of sense in .NET. The closest thing to that is (char)140, which is a control character that moves the telex head up slightly to let you write superscripts - something that hasn't really been needed much since the 1980s at the latest. (though to make things worse, some tools like linq-pad will indeed show it as 'Œ', but you can't depend on other things correcting in the same way). Where did you get 'Œ' from, and what do want to do with this 140 at the end of it all? - that we can probably give a real answer to.Priscian
If you want 140 to end up as Œ again though, you just do the reverse: string result = Encoding.GetEncoding("Windows-1252").GetString(new byte[]{140}) or Encoding.GetEncoding("Windows-1252").GetChars(new byte[]{140}) go "right, so I take 140 according to the rules of CP-1252, now it makes sense" and give you back ŒPriscian
@JonHanna: V sorry i was little confused. i want the byte[] result value as int32. i want to make use of this int value[i.e..140] for opening a file and reading the data and picking only first 140 char.. for that i use substring(0,140)... this is the purpose.Verticillaster
@JonHanna: thanks a lots... thank you so much for helping... i got it... :)Verticillaster
I think I've a better idea what's going on here. It's mixing bytes for lengths and bytes for encoded chars in the same stream? If so then it may indeed be ISO-8859-1 or something else again because the 140 doesn't mean any char at all. I'd say start with the bytes, lift of the len (140), then be very careful to see if this means bytes or chars, then lift off what you need from that, and decode just as much as you need. Encoding in one go may not work as there isn't always a 1byte <=> 1char correspondence. If it was e.g. UTF-8 then 140bytes could be 134 chars.Priscian
M
1

I think you are looking for something like this

    String str="œ";
    var bytes = Encoding.GetEncoding("Windows-1252").GetBytes(s);
    string binStr = string.Join("", bytes.Select(b => Convert.ToString(b, 2)));
    int decimalEquivalent=Convert.ToInt32(binStr,2);
    Console.WriteLine(decimalEquivalent);

this is working for ASCII [128-255]

Mindy answered 18/7, 2014 at 14:28 Comment(0)
C
0

You have the wrong encoding. The iso-8859-1 encoding don't have characters 128-159 as pointed out by Hans. According to this acrticle there are 3 encoding that contain the character you are looking for. There is iso-8859-15, Windows-1252 and the other is for mac. Since this comes from a jar file, and as such, should be os independent, I would say the right encoding is iso-8859-15.

With the right encoding your call to GetBytes should return an array that contains the decimal values.

Criticism answered 17/8, 2012 at 14:8 Comment(1)
ISO-8859-1 has characters at 128-159, though they are all control characters. Windows-1252 is common on a lot of OSs, because it's a very common character set among people who use Windows with a Western European language (which is a lot of people).Priscian
P
0

First thing, 140 in ISO-8859-1 is U+008C - ISO-8859-1 has a direct one to one mapping between the number and the code-point - and U+008C is a control character. It famously doesn't have Œ (Famously as there was controversy about the French having to not use the ligature if using it in cases where they normally would, while Æ is included because in some of the languages it was meant to support it's a separate letter "ash" rather than a ligature as per its use in French. Tempers got raised).

string textToConvert = "Œ";

'"Œ"' is a string. It's got nothing to do with "extended ascii". It's implemented by UTF-16 behind the scenes, but you shouldn't even think of it as such, but rather as a string which has nothing to do with numbers, bytes or encodings until such a time as you start reading from and writing to streams (like files).

 Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");

As explained above, you certainly don't want this. You probably want GetEncoding("Windows-1252") as that's a Windows encoding that matches 8859-1 except it took out some of the controls and put in some more letters, including Œ at position 140. Let's assume you change it that way...

byte[] srcTextBytes = iso8859.GetBytes(textToConvert);

Okay, at this point - if you change to using CP-1252- you have a byte array of a single byte, value 140 (0x8C).

byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes);
char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)];
unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0);
System.String szchar = new System.String(destChars);

MessageBox.Show(szchar);

I have no idea what you're trying to do here. You started with a string, and you are ending with a string, what's going on?

Let's abandon this and start from scratch.

If you have a string and you want its bytes in CP-1252 that represent it, then:

byte[] result = Encoding.GetEncoding("Windows-1252").GetBytes(inputString);

If you have some bytes in CP-1252 and you want the string they represent:

string result = System.Text.Encoding.GetEncoding("Windows-1252").GetString(inputBytes);

If you want to read to or write from a stream (file, network stream, etc.) in Windows-1252, then use a StreamReader or StreamWriter created with that encoding:

using(TextReader reader = new StreamReader(source, Encoding.GetEncoding("Windows-1252"));
using(TextWriter writer = new StreamWriter(sink, Encoding.GetEncoding("Windows-1252"));
Priscian answered 17/8, 2012 at 14:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.