UPDATE (April 2018):
The problem still persists, under different settings and computers.
I believe it is related to all UNICODE, UTF-8 characters.
PROBLEM:
My Rmd/R file is saved with UTF-8 encoding. Other sessionInfo()
details:
Platform: x86_64-w64-mingw32/x64 (64-bit)
LC_CTYPE=English_Canada.1252
other attached packages:
[1] knitr_1.17
Here is a simple data frame that I need to print as a table in a html document, e.g. with kable(dt)
or any other way.
dt <- data.frame(
name=c("Борис Немцов","Martin Luter King"),
year=c("2015","1968")
)
Neither of the following works:
Way 1
If I keep Sys.setlocale() as is (i.e. "English_Canada.1252"
), then I get this:
> dt;
name year
1 <U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> 2015
2 Martin Luter King 1968
> kable(dt)
|name |year |
|:-----------------------------------------------------------------------------------------|:----|
|<U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> |2015 |
|Martin Luter King |1968 |
Note that <U+....>
are printed instead of characters.
Using dt$name <- enc2utf8(as.character(dt$name))
did not help.
Way 2
If I change Sys.setlocale("LC_CTYPE", "russian")
#"Russian_Russia.1251"`,
then I get this:
> dt;
name year
1 Áîðèñ Íåìöîâ 2015
2 Martin Luter King 1968
> kable(dt)
|name |year |
|:-----------------|:----|
|Áîðèñ Íåìöîâ |2015 |
|Martin Luter King |1968 |
Note that characters have become gibberish.
Using print(dt,encoding="windows-1251"); print(dt,encoding="UTF-8")
had no effect.
Any advice?
The closest I could find to address this problem are in the following links, but they did not help: http://blog.rolffredheim.com/2013/01/r-and-foreign-characters.html, https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows, https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets
I also tried to save my file with 1251 encoding (instead of current UTF-8 encoding) and some other character conversion/processing packages. Nothing helped yet.
UPDATE:
Opened related question: How to change Sys.setlocale, when you get Error "request to set locale … cannot be honored"
en_US.UTF-8
when printing to the console or knitting an HTML document. Using LaTeX is another story. – LoathOS reports request to set locale to "en_US.UTF-8" cannot be honored[1] ""
. This may explain why it works for you, but not for me (my local isLC_CTYPE=English_Canada.1252
). So what can I do? – Sassercon = file("TestSpanishText.R", encoding = "UTF-8"); read_chunk(con);close(con)
- – SasserSys.setlocale(, "Russian")
in your~/.Rprofile
? If you don't know what is.Rprofile
, you may see bookdown.org/yihui/blogdown/global-options.html – Spruceprint(dt)
still showed the same gibberish, however printing with` kable(dt)` produced exactly what is needed! So conclusion - puttingSys.setlocale("LC_CTYPE", "russian")
is not sufficient. You have to put it in .Rprofile and ...it works specifically withkable()
(thanks toknitr
developer :) – Sasser