Are there correct encodings for the backslash and tilde characters in Shift_JIS?
Asked Answered
P

1

6

Or do these two characters simply not exist in Shift_JIS?

The first 128 characters in the Shift_JIS character encoding scheme match ASCII except for two: 0x5C is a Yen symbol (¥) instead of a backslash, and 0x7E is an overline () instead of a tilde.

While there's plenty of clear information about how ¥ and takeover for \ and ~, I haven't been able to find any clear statement about whether \ and ~ simply don't exist in Shift_JIS, or if there are alternate (probably multi-byte) encodings to handle these two displaced ASCII characters.

When I try to encode \ or ~ using node-iconv, it throws an error.

iconv-lite encodes both ¥ and \ as 0x5C, and both and ~ as 0x7E. When decoding, iconv-lite currently (and unfortunately) decodes 0x5C as \ and 0x7E as ~, pending response to a bug report:

Pleasure answered 29/6, 2019 at 17:43 Comment(0)
F
4

Character set of Shift_JIS is defined in JIS (Japanese Industrial Standard).

Character encoding Shift_JIS uses JIS X 0201 for half-width character set, and JIS X 0208 for full-width character set.

\ and ~ in the question mean the half-width backslash and tilde in ISO/IEC 8859-1(Latin-1), right? JIS X 0201 (half-width character set) doesn't contain these characters (see https://en.wikipedia.org/wiki/JIS_X_0201).

So the answer is, both of \ and ~ don't exist in Shift_JIS.

FYR, JIS X 0208 contains full-width backslash (FULLWIDTH REVERSE SOLIDUS, U+FF3C in Unicode). JIS X 0208 doesn't contain full-width tilde, but Shift_JIS equivalent in Windows (Microsoft Codepage 932) contains full-width tilde (FULLWIDTH TILDE, U+FF5E in Unicode).

Falcate answered 7/8, 2019 at 14:47 Comment(2)
It's very odd that two characters so commonly used in coding would be totally left out! Displaced I can see, but totally missing is surprising. I guess these days UTF-8 and other encodings have pretty much taken over, however, so this is just a legacy issue at this point.Pleasure
It's not so odd. All it means is that Shift-JIS is not a suitable encoding for writing program source code. That doesn't make Shift-JIS any less suitable for encoding text documents. There is nothing preventing you from writing a program whose source code is in UTF-8 or ASCII, which processes text documents using Shift-JIS. An analogy is that you can write a C compiler using FORTRAN source code. Or write laws for France in the English language.Guillen

© 2022 - 2024 — McMap. All rights reserved.