How do I quickly convert the size element of file.info() from bytes to KB, MB, GB, etc.?
Asked Answered
B

2

40

I expect there is already an answer for this on stackoverflow, and I simply failed to find it.

Desired outcome: Quickly convert the file size element in a file.info() call from bytes to KB, MB, etc. I'm fine if the output is either i) a character string with the desired size type, e.g., "96 bytes" or ii) simply a numeric conversion, e.g., from 60963 bytes to 60.963 KB (per Google).

Repro steps:

  1. Create a folder to store the file:

    dir.create("census-app/data")
    
  2. Download the file (~60KB):

    download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
    "census-app/data/counties.rds")
    
  3. Use file.info()$size to return the file size in bytes:

    file.info("census-app//data//counties.rds")$size
    [1] 60963
    

From there, I'm stuck. I realize I can do some complicated/manual parsing and calculation to make the conversion (see Converting kilobytes, megabytes etc. to bytes in R).

However, I'm hoping I can simply use a base function or something similar:

    format(file.info("census-app//data//counties.rds")$size, units = "KB")
    [1] "60963"
    # Attempt to return file size in KB simply returns the size in bytes
    # NOTE: format(x, units = "KB") works fine when I
    # pass it object.size() for an object loaded in R
Brandebrandea answered 22/4, 2015 at 3:54 Comment(1)
An apparently removed comment made a valid point I'd like to answer: Why not just use the simple math of x bytes / 1024 to return the value in KB? I agree this is a simple calculation and part of my goal is to avoid manual intervention a) in case I accidentally enter something like 1000, instead of 1024 and b) to forgo researching the correct conversion ratio.Brandebrandea
P
55

The object.size() function does this type of formatting for it's results, but its meant to tell you the size of the R object you pass to it. It is not set up to take an arbitrary by value.

However, we can "steal" some of it's formatting logic. You can call it with

utils:::format.object_size(60963, "auto")
# [1] "59.5 Kb"

In that way we can call the un-exported formatting function. You can bring up the additional formatting options on the ?format.object_size help page. Note that it uses the rule that 1 Kb = 1024 bytes (not 1000 as in your example).

Pasley answered 22/4, 2015 at 4:3 Comment(4)
Thank you, sir! When I pull up ?format.object_size the help page points to object.size {utils}. Will you please explain how I know when to expand a function like object.size() to some_function.object_size or point me to an explanatory resource? I'm inferring this is a simple combination of the two functions, and I'm guessing _ characters need to be changed to .. Correct?Brandebrandea
This case was a bit unusual. I looked for a function that I thought might do formatting, found object.size(), then looked at the source (type object.size without the parenthesis). I saw that it returns on object of type "object_size". (But it's really not that common to use the function with periods replaced with underscores and it could be anything). Then I looked for methods for that class with methods(class="object_size") and found the formatting function.Pasley
The proper way to call utils:::format.object_size() is to call format() and make sure the object passed has the class attribute set. This can be done as size <- structure(size, class="object_size") and then format(size, units="auto"), or in one go as format(structure(size, class="object_size"), units="auto").Graphic
By now also SI units are supported: format(structure(2^32-1, class="object_size"), units="auto", standard="SI") Thanks @HenrikB, see github.com/HenrikBengtsson/Wishlist-for-R/issues/6Gaul
N
23

Use the humanReadable() function in the gdata package. It has options to report the size in base 1000 ('SI') or base 1024 ('IEC') units, and it is also vectorized so you can process an entire vector of sizes at the same time.

For example:

> humanReadable(c(60810, 124141, 124, 13412513), width=4)
[1] "60.8 kB" "124 kB"  "124 B"   "13.4 MB"
> humanReadable(c(60810, 124141, 124, 13412513), standard="IEC", width=4)
[1] "59.4 KiB" "121 KiB"  "124 B"    "12.8 MiB"

I'm currently working to prepare release 2.16.0 of gdata, which adds the ability to indicate which unit you would like to use for reporting the sizes, as well as "Unix"-style units.

> humanReadable(c(60810, 124141, 124, 13412513), standard="SI", units="kB")
[1] "   60.8 kB" "  124.1 kB" "    0.1 kB" "13412.5 kB"
> humanReadable(c(60810, 124141, 124, 13412513), standard="IEC", units="KiB")
[1] "   59.4 KiB" "  121.2 KiB" "    0.1 KiB" "13098.2 KiB"
humanReadable(c(60810, 124141, 124, 13412513), standard="Unix", units="K")
[1] "   59.4 K" "  121.2 K" "    0.1 K" "13098.2 K"

-Greg [maintainer of the gdata package]

Update

CRAN has accepted gdata version 2.16.1, which supports standard="Unix" and units= options, and it should be available on a CRAN mirrors shortly.

Neutrality answered 23/4, 2015 at 2:43 Comment(1)
I second using gdata::humanReadable() for this, especially since R's own format() function for object_size objects uses incorrect notation, e.g. Kb (=Kbits) when it should use KB (or KiB), cf. stat.ethz.ch/pipermail/r-devel/2014-September/069755.htmlGraphic

© 2022 - 2024 — McMap. All rights reserved.