How do I get the size of a file in megabytes using Perl?
Asked Answered
V

8

17

I want to get the size of a file on disk in megabytes. Using the -s operator gives me the size in bytes, but I'm going to assume that then dividing this by a magic number is a bad idea:

my $size_in_mb = (-s $fh) / (1024 * 1024);

Should I just use a read-only variable to define 1024 or is there a programmatic way to obtain the amount of bytes in a kilobyte?

EDIT: Updated the incorrect calculation.

Ventilator answered 4/2, 2009 at 15:18 Comment(0)
L
33

If you'd like to avoid magic numbers, try the CPAN module Number::Bytes::Human.

use Number::Bytes::Human qw(format_bytes);
my $size = format_bytes(-s $file); # 4.5M
Lorri answered 4/2, 2009 at 18:25 Comment(1)
just discovered it can also parse human readable strings back into bytes!Swordtail
R
11

This is an old question and has been already correctly answered, but just in case your program is constrained to the core modules and you can not use Number::Bytes::Human here you have several other options I have been collected over time. I have kept them also because each one use a different Perl approach and is a nice example for TIMTOWTDI:

  • example 1: uses state to avoid reinitialize the variable each time (before perl 5.16 you need to use feature state or perl -E)

http://kba49.wordpress.com/2013/02/17/format-file-sizes-human-readable-in-perl/

    sub formatSize {
        my $size = shift;
        my $exp = 0;

        state $units = [qw(B KB MB GB TB PB)];

        for (@$units) {
            last if $size < 1024;
            $size /= 1024;
            $exp++;
        }

        return wantarray ? ($size, $units->[$exp]) : sprintf("%.2f %s", $size, $units->[$exp]);
    }
  • example 2: using sort map

.

sub scaledbytes {

    # http://www.perlmonks.org/?node_id=378580
    (sort { length $a <=> length $b 
          } map { sprintf '%.3g%s', $_[0]/1024**$_->[1], $_->[0]
                }[" bytes"=>0]
                ,[KB=>1]
                ,[MB=>2]
                ,[GB=>3]
                ,[TB=>4]
                ,[PB=>5]
                ,[EB=>6]
    )[0]
  }
  • example 3: Take advantage of the fact that 1 Gb = 1024 Mb, 1 Mb = 1024 Kb and 1024 = 2 ** 10:

.

# http://www.perlmonks.org/?node_id=378544
my $kb = 1024 * 1024; # set to 1 Gb

my $mb = $kb >> 10;
my $gb = $mb >> 10;

print "$kb kb = $mb mb = $gb gb\n";
__END__
1048576 kb = 1024 mb = 1 gb
  • example 4: use of ++$n and ... until .. to obtain an index for the array

.

# http://www.perlmonks.org/?node_id=378542
#! perl -slw
use strict;

sub scaleIt {
    my( $size, $n ) =( shift, 0 );
    ++$n and $size /= 1024 until $size < 1024;
    return sprintf "%.2f %s",
           $size, ( qw[ bytes KB MB GB ] )[ $n ];
}

my $size = -s $ARGV[ 0 ];

print "$ARGV[ 0 ]: ", scaleIt $size;  

Even if you can not use Number::Bytes::Human, take a look at the source code to see all the things that you need to be aware of.

Rademacher answered 16/3, 2014 at 9:43 Comment(0)
F
7

You could of course create a function for calculating this. That is a better solution than creating constants in this instance.

sub size_in_mb {
    my $size_in_bytes = shift;
    return $size_in_bytes / (1024 * 1024);
}

No need for constants. Changing the 1024 to some kind of variable/constant won't make this code more readable.

Fondue answered 5/2, 2009 at 16:18 Comment(0)
B
4

Well, there's not 1024 bytes in a meg, there's 1024 bytes in a K, and 1024 K in a meg...

That said, 1024 is a safe "magic" number that will never change in any system you can expect your program to work in.

Bless answered 4/2, 2009 at 15:19 Comment(8)
a similar situation would be converting between meters and kilometers... would you feel bad about including the "magic" factor of 1000? This is a straight unit conversion that will NEVER change.Cobblestone
talk to marketing .. they have a different opinion (wrong IMHO, but hey, they have more money)Epirus
Updated the question. It's early so forgive my mistaking of kilobytes for megabytes. :)Ventilator
Even if the magic number is "safe", your code is more readable having a named constant instead. Consider physical constants like G, c, or mathematical ones like pi or e. Sure, they will never change in our universe, but your expressions are much more readable if used by name rather than by value.Anallese
Just use the Number::Bytes::Human module for this. Much easier than doing it yourself, and much more readable.Lorri
One place I worked (an IPS), a gigabyte was defined as 1024Mb for some purposes, but 1000Mb for other purposes. So if there are multiple ways to define a Gigabyte (or a Megabyte), then the magic numbers can change based on which definition is being used.Banian
At that point, I think it should be critical to use proper definitions of gigabyte and gibibyte (en.wikipedia.org/wiki/GiB). As two separate entities, we shouldn't worry about that. You can always easily convert from giga to gibi if need be.Bless
Does anyone really uses gibi outside of Wikipedia?Swordtail
D
4

I would read this into a variable rather than use a magic number. Even if magic numbers are not going to change, like the number of bytes in a megabyte, using a well named constant is a good practice because it makes your code more readable. It makes it immediately apparent to everybody else what your intention is.

Diligent answered 4/2, 2009 at 15:34 Comment(0)
H
1

1) You don't want 1024. That gives you kilobytes. You want 1024*1024, or 1048576.

2) Why would dividing by a magic number be a bad idea? It's not like the number of bytes in a megabyte will ever change. Don't overthink things too much.

Heraldry answered 4/2, 2009 at 15:20 Comment(0)
D
1

Don't get me wrong, but: I think that declaring 1024 as a Magic Variable goes a bit too far, that's a bit like "$ONE = 1; $TWO = 2;" etc.

A Kilobyte has been falsely declared as 1024 Bytes since more than 20 years, and I seriously doubt that the operating system manufacturers will ever correct that bug and change it to 1000.

What could make sense though is to declare non-obvious stuff, like "$megabyte = 1024 * 1024" since that is more readable than 1048576.

Dismay answered 4/2, 2009 at 15:21 Comment(0)
H
1

Since the -s operator returns the file size in bytes you should probably be doing something like

my $size_in_mb = (-s $fh) / (1024 * 1024);

and use int() if you need a round figure. It's not like the dimensions of KB or MB is going to change anytime in the near future :)

Hackberry answered 4/2, 2009 at 15:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.