I am building my own package, and I keep running into encoding issues because the functions in my package has non-english (non-ASCII) characters.
Inherently, Korean characters are a part of many of the functions in my package. A sample function:
library(rvest)
sampleprob <- function(url) {
# sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
result <- grepl("연결재무제표 주석", html_text(read_html(url)))
return(result)
}
However, when installing the package I run into encoding problems.
I created a sample package (https://github.com/hyk0127/KorEncod/) with just one function (what is shown above) and uploaded it onto my github page for a reproducible example. I run the following code to install:
library(devtools)
install_github("hyk0127/KorEncod")
Below is the error message that I see
Error : (converted from warning) unable to re-encode 'hello.R' line 7
ERROR: unable to collate and parse R files for package 'KorEncod'
* removing 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
* restoring previous 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
Error: Failed to install 'KorEncod' from GitHub:
(converted from warning) installation of package ‘C:/Users/myname/AppData/Local/Temp/RtmpmS5ZOe/file48c02d205c44/KorEncod_0.1.0.tar.gz’ had non-zero exit status
The error message about line 7
refers to the Korean characters in the function.
It is possible to locally install the package with tar.gz file, but then the function does not run as intended, because the Korean characters are recognized in broken encoding.
This cannot be the first time that someone has tried building a package that has non-english (or non-ASCII) characters, and yet I couldn't find a solution to this. Any help will be deeply appreciated.
A few pieces of info that I think are related:
Currently the DESCRIPTION
file specifies "Encoding: UTF-8".
I have used sys.setlocale
to set the locale into Korean and back to no avail.
I have specified @encoding UTF-8
to the function to no avail as well.
I am currently using Windows where the administrative language is set to English. I have tried using a different laptop with Windows & administrative language set to Korean, and the same problem appears.
collate
andctype
are important indevtools::session_info()
(both derived fromSys.getlocale()
IMHO). Unfortunately,Sys.setlocale(category = "LC_CTYPE" , locale=".65001")
cannot be honored in Windows, unlikeSys.setlocale(category = "LC_COLLATE" , locale=".65001")
which works as expected… Aged incompatibility Windows vs. UTF-8 in R. – Armbrecht