Manipulating files with non-English names in R
Asked Answered
C

3

11

When using the R functions to manipulate files in Windows, e.g. dir(), those with non-English characters, like Cyrillic, are presented as a sequence of "?".

Similarly, when using file.rename(), if the new name contains non-English characters, the file is renamed with unreadable characters, apparently mapping to a different encoding.

There are a number of functions dealing with encoding for the file contents, but how can we deal with file names?

To reproduce the problem:
Outside R create the file "привет.txt" in the working dir; then in R:

dir() 
# [1] "??????.txt"      
# ...

Note that setting:

Sys.setlocale(category = "LC_ALL", locale="Russian")

doesn't help.

Note: I am using R 3.1.2 for Windows, under Windows 8.1 in English and in Windows consoles (cmd.exe) I see the Cyrillic names properly.

Coprology answered 22/6, 2014 at 18:30 Comment(0)
S
4

try this: iconv("привет.txt","UTF-8","CP1251")

Convert Character Vector between Encodings:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/iconv.html

The iconv library:
http://www.delorie.com/gnu/docs/recode/recode_30.html

Salisbury answered 21/2, 2015 at 23:49 Comment(1)
It works like this; but, if I use iconv(dir() ,"UTF-8","CP1251"), "привет.txt" becomes "??????.txt"Coprology
W
3

One easy solution is to change location if you only want to run the script once or twice and know the target language.

Sys.setlocale(category = "LC_ALL", locale="Russian") 
x1<-read.table("C:\\привет.txt",head=TRUE)  #work just fine with R_3.1.2
Sys.setlocale(category = "LC_ALL", locale="English") 
x2<-read.table("C:\\привет.txt",head=TRUE)  #will present error

In case you want to read from server, I strongly recommend that you use Python or other script language to process Unicode path. If you insist, I would say: (c.f. Set locale to system default UTF-8)

Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
x3<-read.table("C:\\привет.txt",head=TRUE)  #will present warning or not, but successfully read a table into x3

However, you should still process this table's content using some package (e.g. stringi) and remember to revert location after this read operation if necessary.

==Update==

(c.f.https://stat.ethz.ch/pipermail/r-help/2011-May/278206.html) This may also be an inconsistent problem according to R-FAQ document:

3.6 I don't see characters with accents at the R console, for example in text.

You need to specify a font in Rconsole (see Q5.2) that supports the encoding in use. This used to be a problem in earlier versions of Windows, but now it is hard to find a font which does not.

Support for these characters within Rterm depends on the environment (the terminal window and shell, including locale and codepage settings) within which it is run as well as the font used by the terminal window. Those are usually on legacy DOS settings and need to altered.

Taking this, please tell me if you can input Russian file names in R-console using 'read'. Thanks.

Womack answered 22/2, 2015 at 2:29 Comment(5)
Since I do not know Russian at all, I use @Salisbury 's sample filename. :-)Womack
Say that "привет.txt" is in the working dir, Sys.setlocale(category = "LC_ALL", locale="Russian"); dir(), gives "??????.txt" for "привет.txt".Coprology
@Coprology Taking your comment, I tried Matt's method and my method again. Both work just fine. And I tried Chinese language, too. My machine setting is Win7(64 bit) with R 3.1.2(x64). What I did yesterday was to run script in eclipse with StatET. But I tried to use R-console now and it works just fine with me. Why not try another machine and check if your system has set up Unicode properly? Thanks for your reply.Womack
Sys.setlocale("LC_ALL", "Russian_Russia.1252") Try this. And see:link[link]Womack
@Henr.L: привет means "hello"Salisbury
L
2

I did not manage to correct the display on dir() function, but the function Sys.glob(paths="*") get you the same result as dir() except that it DOES display the name in cyrillic, no matter your locale setting.

For exemple, if your file is in the working directory, you can try :

Sys.glob(paths="*")

And it should display it properly as "привет.txt" (works for me). Documentation on Sys.glob: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Sys.glob

Similarly, to rename the file, the proper method seems to be using a different function than file.rename. The following should rename a file named 'test.doc' to 'привет.txt' (works for me)

fs::file_move("test.doc", "привет.txt")

fs package allows to use functions closer to cmd functions to manipulate files. Documentation : https://cran.r-project.org/web/packages/fs/vignettes/function-comparisons.html

Laocoon answered 12/9, 2021 at 17:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.