Printing UTF-8 characters in R, Rmd, knitr, bookdown
Asked Answered
S

1

5

UPDATE (April 2018):
The problem still persists, under different settings and computers. I believe it is related to all UNICODE, UTF-8 characters.

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

PROBLEM:

My Rmd/R file is saved with UTF-8 encoding. Other sessionInfo() details:

Platform: x86_64-w64-mingw32/x64 (64-bit)
LC_CTYPE=English_Canada.1252

other attached packages:
[1] knitr_1.17

Here is a simple data frame that I need to print as a table in a html document, e.g. with kable(dt) or any other way.

dt <- data.frame(
name=c("Борис Немцов","Martin Luter King"),
year=c("2015","1968") 
)

Neither of the following works:

Way 1

If I keep Sys.setlocale() as is (i.e. "English_Canada.1252"), then I get this:

> dt;                                                                                           
name year
1 <U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> 2015
2 Martin Luter King 1968
> kable(dt)
|name                                                                                      |year |
|:-----------------------------------------------------------------------------------------|:----|
|<U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> |2015 |
|Martin Luter King                                                                         |1968 |

Note that <U+....> are printed instead of characters.
Using dt$name <- enc2utf8(as.character(dt$name)) did not help.

Way 2

If I change Sys.setlocale("LC_CTYPE", "russian") #"Russian_Russia.1251"`, then I get this:

> dt; 
name year
1      Áîðèñ Íåìöîâ 2015
2 Martin Luter King 1968

> kable(dt)
|name              |year |
|:-----------------|:----|
|Áîðèñ Íåìöîâ      |2015 |
|Martin Luter King |1968 |

Note that characters have become gibberish.
Using print(dt,encoding="windows-1251"); print(dt,encoding="UTF-8") had no effect.

Any advice?

The closest I could find to address this problem are in the following links, but they did not help: http://blog.rolffredheim.com/2013/01/r-and-foreign-characters.html, https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows, https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets

I also tried to save my file with 1251 encoding (instead of current UTF-8 encoding) and some other character conversion/processing packages. Nothing helped yet.

UPDATE:

Opened related question: How to change Sys.setlocale, when you get Error "request to set locale … cannot be honored"

Sasser answered 17/1, 2018 at 17:43 Comment(5)
I have no problems using my native locale en_US.UTF-8 when printing to the console or knitting an HTML document. Using LaTeX is another story.Loath
Thanks for trying - I tried to set my locale to what you have ` Sys.setlocale("LC_CTYPE", "en_US.UTF-8")` but got this error: OS reports request to set locale to "en_US.UTF-8" cannot be honored[1] "" . This may explain why it works for you, but not for me (my local is LC_CTYPE=English_Canada.1252). So what can I do?Sasser
I fount two related suggested from knitr developer: #15704202, and #27983066 . The idea is to move UTF-8 code in separate file and then read it from there: con = file("TestSpanishText.R", encoding = "UTF-8"); read_chunk(con);close(con) -Sasser
Can you try to set Sys.setlocale(, "Russian") in your ~/.Rprofile? If you don't know what is .Rprofile, you may see bookdown.org/yihui/blogdown/global-options.htmlSpruce
Fantastisch! - I did that and printing with print(dt) still showed the same gibberish, however printing with` kable(dt)` produced exactly what is needed! So conclusion - putting Sys.setlocale("LC_CTYPE", "russian") is not sufficient. You have to put it in .Rprofile and ...it works specifically with kable() (thanks to knitr developer :)Sasser
S
1

The only solution that worked was the one suggested by Yihui Xie (knitr developer), which is :
creating a file .Rprofile, which contains one line Sys.setlocale("LC_CTYPE", "russian") and placing it in your home or working directory.

However, please note that, it works only with use of kable(), i.e with help of knitr package.
If you try to print with print(dt$name[1]), you still get Áîðèñ Íåìöîâ.
However, if you use kable(dt$name[1]), you'll get what you need - Борис Немцов !

Sasser answered 12/2, 2018 at 15:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.