With R 4.4.0 on a MacBook, nothing locale() or encoding related in .Rprofile or .Renviron.
Sys.getlocale()
on a fresh session returns "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
in both the native R console, or RStudio.
KOI8-R
is a Cyrillic encoding that uses one byte per character. When using reprex from R studio (this is my output, which conforms to my expectations.
Note: this is using the reprex addin, which is running reprex::reprex()
, itself using as default input code from the paste bin.
ch256 <- sapply(0:255, function(x) rawToChar(as.raw(x)))
Sys.setlocale("LC_CTYPE", "ru_RU.KOI8-R")
#> [1] "ru_RU.KOI8-R"
ch256
#> [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a" "\b" "\t"
#> [11] "\n" "\v" "\f" "\r" "\016" "\017" "\020" "\021" "\022" "\023"
#> [21] "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035"
#> [31] "\036" "\037" " " "!" "\"" "#" "$" "%" "&" "'"
#> [41] "(" ")" "*" "+" "," "-" "." "/" "0" "1"
#> [51] "2" "3" "4" "5" "6" "7" "8" "9" ":" ";"
#> [61] "<" "=" ">" "?" "@" "A" "B" "C" "D" "E"
#> [71] "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
#> [81] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y"
#> [91] "Z" "[" "\\" "]" "^" "_" "`" "a" "b" "c"
#> [101] "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
#> [111] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w"
#> [121] "x" "y" "z" "{" "|" "}" "~" "\177" "─" "│"
#> [131] "┌" "┐" "└" "┘" "├" "┤" "┬" "┴" "┼" "▀"
#> [141] "▄" "█" "▌" "▐" "░" "▒" "▓" "⌠" "■" "∙"
#> [151] "√" "≈" "≤" "≥" " " "⌡" "°" "²" "·" "÷"
#> [161] "═" "║" "╒" "ё" "╓" "╔" "╕" "╖" "╗" "╘"
#> [171] "╙" "╚" "╛" "╜" "╝" "╞" "╟" "╠" "╡" "Ё"
#> [181] "╢" "╣" "╤" "╥" "╦" "╧" "╨" "╩" "╪" "╫"
#> [191] "╬" "©" "ю" "а" "б" "ц" "д" "е" "ф" "г"
#> [201] "х" "и" "й" "к" "л" "м" "н" "о" "п" "я"
#> [211] "р" "с" "т" "у" "ж" "в" "ь" "ы" "з" "ш"
#> [221] "э" "щ" "ч" "ъ" "Ю" "А" "Б" "Ц" "Д" "Е"
#> [231] "Ф" "Г" "Х" "И" "Й" "К" "Л" "М" "Н" "О"
#> [241] "П" "Я" "Р" "С" "Т" "У" "Ж" "В" "Ь" "Ы"
#> [251] "З" "Ш" "Э" "Щ" "Ч" "Ъ"
However the same code printed in my RStudio console prints something different (fake reprex from output copy and paste):
ch256 <- sapply(0:255, function(x) rawToChar(as.raw(x)))
Sys.setlocale("LC_CTYPE", "ru_RU.KOI8-R")
ch256
#> [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a" "\b" "\t"
#> [11] "\n" "\v" "\f" "\r" "\016" "\017" "\020" "\021" "\022" "\023"
#> [21] "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035"
#> [31] "\036" "\037" " " "!" "\"" "#" "$" "%" "&" "'"
#> [41] "(" ")" "*" "+" "," "-" "." "/" "0" "1"
#> [51] "2" "3" "4" "5" "6" "7" "8" "9" ":" ";"
#> [61] "<" "=" ">" "?" "@" "A" "B" "C" "D" "E"
#> [71] "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
#> [81] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y"
#> [91] "Z" "[" "\\" "]" "^" "_" "`" "a" "b" "c"
#> [101] "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
#> [111] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w"
#> [121] "x" "y" "z" "{" "|" "}" "~" "\177" "�" "�"
#> [131] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [141] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [151] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [161] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [171] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [181] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [191] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [201] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [211] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [221] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [231] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [241] "�" "�" "�" "�" "�" "�" "�" "�" "�" "�"
#> [251] "�" "�" "�" "�" "�" "�"
In the R for Mac OS X GUI (R.app) it's different again, the encoding appears to be ignored and latin1 looking characters are printed (fake reprex from output copy and paste):
ch256 <- sapply(0:255, function(x) rawToChar(as.raw(x)))
Sys.setlocale("LC_CTYPE", "ru_RU.KOI8-R")
#> [1] "ru_RU.KOI8-R"
ch256
#> [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a" "\b" "\t"
#> [11] "\n" "\v" "\f" "\r" "\016" "\017" "\020" "\021" "\022" "\023"
#> [21] "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035"
#> [31] "\036" "\037" " " "!" "\"" "#" "$" "%" "&" "'"
#> [41] "(" ")" "*" "+" "," "-" "." "/" "0" "1"
#> [51] "2" "3" "4" "5" "6" "7" "8" "9" ":" ";"
#> [61] "<" "=" ">" "?" "@" "A" "B" "C" "D" "E"
#> [71] "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
#> [81] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y"
#> [91] "Z" "[" "\\" "]" "^" "_" "`" "a" "b" "c"
#> [101] "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
#> [111] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w"
#> [121] "x" "y" "z" "{" "|" "}" "~" "\177" "Ä" "Å"
#> [131] "Ç" "É" "Ñ" "Ö" "Ü" "á" "à" "â" "ä" "ã"
#> [141] "å" "ç" "é" "è" "ê" "ë" "í" "ì" "î" "ï"
#> [151] "ñ" "ó" "ò" "ô" "ö" "õ" "ú" "ù" "û" "ü"
#> [161] "†" "°" "¢" "£" "§" "•" "¶" "ß" "®" "�"
#> [171] "™" "´" "¨" "≠" "Æ" "Ø" "∞" "±" "≤" "≥"
#> [181] "¥" "µ" "∂" "∑" "∏" "π" "∫" "ª" "º" "Ω"
#> [191] "æ" "ø" "¿" "¡" "¬" "√" "ƒ" "≈" "∆" "«"
#> [201] "»" "…" " " "À" "Ã" "Õ" "Œ" "œ" "–" "—"
#> [211] "“" "”" "‘" "’" "÷" "◊" "ÿ" "Ÿ" "⁄" "€"
#> [221] "‹" "›" "fi" "fl" "‡" "·" "‚" "„" "‰" "Â"
#> [231] "Ê" "Á" "Ë" "È" "Í" "Î" "Ï" "Ì" "Ó" "Ô"
#> [241] "" "Ò" "Ú" "Û" "Ù" "ı" "ˆ" "˜" "¯" "˘"
#> [251] "˙" "˚" "¸" "˝" "˛" "ˇ"
In fact I can reproduce the above with the ISO8859-1 encoding as well (latin1), the native R console will print those correctly this time like reprex, but the RStudio output will still be wrong.
I know that making everything UTF-8 fixes everything, but I really want to understand :
- What's happening here?
- Is it possible to get the correct output everywhere?
- Is this output different on different systems?
.Rprofile
,.Renviron
, ...) were used would also be helpful. – LempresTerminal > Settings > Profiles > Advanced > International
. I'm not sure if RStudio has a similar setting for its console ... – LempresSys.setlocale("LC_CTYPE", "ru_RU.KOI8-R")
take precedence over those anyway ? – Birddoghelp("Sys.setlocale")
that attempts to change the character set of an already launched R process "may not work and are likely to lead to some confusion". It depends on the platform and on the application embedding R. You have to read the corresponding documentation. – Lempresiconv
or maybeenc2native
) to the system encoding so that it displays correctly in the system encoding. But there can be portability issues there too. IMO you do not need to be testing display extensively if that is not the purpose of your software. – Lempres