How to check file size before opening?
Asked Answered
J

6

41

How can I check the size of a file before I load it into R?

For example:

http://math.ucdenver.edu/RTutorial/titanic.txt

I'd like to use the optimal command to open a file based on the file's size.

Jaddan answered 1/6, 2015 at 18:38 Comment(3)
?file.info is probably what you want.Oeillade
#20922093 is probably what you wantFontanez
the accepted answer is not the most up-to-date answerGastrula
S
60

Use file.info()

file.info("data/ullyses.txt")

                    size isdir mode               mtime               ctime               atime  uid  gid
data/ullyses.txt 1573151 FALSE  664 2015-06-01 15:25:55 2015-06-01 15:25:55 2015-06-01 15:25:55 1008 1008

Then extract the column called size:

file.info("data/ullyses.txt")$size
[1] 1573151
Strident answered 1/6, 2015 at 18:43 Comment(3)
and if its from "http:" is there a way to measure size before loading?Jaddan
You may have to use download.file() and then check the file size locally.Strident
Since R 3.2 there's a file.size() wrapper.Madox
P
13

Perhaps it has been added since this discussion, but at least for R3.4+, the answer is file.size.

Pained answered 22/9, 2020 at 9:4 Comment(0)
F
8
library(RCurl)
url = "http://math.ucdenver.edu/RTutorial/titanic.txt"
xx = getURL(url, nobody=1L, header=1L)
strsplit(xx, "\r\n")
Fontanez answered 2/6, 2015 at 5:29 Comment(0)
C
4

Besides file.size mentioned above, you can also use file_size from package fs, which will print the size in a more human-readable output, showing MB or GB instead of bytes.

As an example, compare the output returned by the two functions:

library(fs)

file.size(system.file("data/Rdata.rdb", package = "datasets"))
#> [1] 114974
fs::file_size(system.file("data/Rdata.rdb", package = "datasets"))
#> 112K

file.size(system.file("data/Rdata.rdb", package = "spData"))
#> [1] 2676333
fs::file_size(system.file("data/Rdata.rdb", package = "spData"))
#> 2.55M
Comical answered 22/4, 2022 at 14:22 Comment(0)
L
1

If you don't want to download the file before knowing its size, you can try something like this:

Note: This will only work in Mac or Linux.

file_url = 'http://math.ucdenver.edu/RTutorial/titanic.txt'
curl_cmd = paste('curl -X HEAD -i', file_url)
system_cmd = paste(curl_cmd, '|grep Content-Length |cut -d : -f 2')

The above will pack together a string to be executed using system(). The curl_cmd string tells curl to go get just the header of the file.

The system_cmd string packs on some extra commands to parse the header and extract just the filesize.

Now, call system() and use the intern = TRUE argument to tell R to hold onto the output.

b <- system(system_cmd, intern = TRUE)
##  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current 
##                              Dload  Upload   Total   Spent    Left  Speed
##   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:-- 0   
## curl: (18) transfer closed

It will download just the header for the file and parse it to get the filesize. Now b will be the filesize in bytes.


Then you can decide how to open the file, or print something friendly like:

print(paste("There are", as.numeric(b)/1e6, "mb in the file:", file_url))
## [1] "There are 0.055692 mb in the file: http://math.ucdenver.edu/RTutorial/titanic.txt"
Lazar answered 1/6, 2015 at 22:2 Comment(3)
It would be cool if someone could share a solution that works in all host environments. I tried fiddling in RCurl for about five minutes but didn't get very far.Lazar
#20922093Fontanez
Awesome! So much better.Lazar
L
0
# Suppose you have a list of files named filelist.  For example...

filelist = c("./myfile1.txt", "./myfile2.txt", "./myfile3.txt")

# The command above assumes that the files are in your current working directory "./"
# If your files are in a different location, you need to replace "./" 
# with the path to the directory that holds the files

# or, if you have only one data file,  filelist = "./myfile1.txt"

# To check which files meet a particular size criterion, you can use the command below.
# For example, the command below checks whether the file size is greater than 0.
# The final filelist includes only the file names that meet the criterion.

filelist = filelist[file.size(filelist)>0]

# If no files meet the criterion, then the final filelist will be "character(0)"
Lamdin answered 17/7 at 10:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.