Python format size application (converting B to KB, MB, GB, TB)
Asked Answered
P

23

48

I am trying to write an application to convert bytes to kb to mb to gb to tb. Here's what I have so far:

def size_format(b):
    if b < 1000:
              return '%i' % b + 'B'
    elif 1000 <= b < 1000000:
        return '%.1f' % float(b/1000) + 'KB'
    elif 1000000 <= b < 1000000000:
        return '%.1f' % float(b/1000000) + 'MB'
    elif 1000000000 <= b < 1000000000000:
        return '%.1f' % float(b/1000000000) + 'GB'
    elif 1000000000000 <= b:
        return '%.1f' % float(b/1000000000000) + 'TB'

The problem is, when I try the application I get everything after the decimal zeroing out. example size_format(623) yields '623B' but with size_format(6200), instead of getting '6.2kb' I'm getting '6.0kb'. Any ideas why?

Parsnip answered 21/9, 2012 at 2:49 Comment(1)
A hint for the future: when you paste in code, select it all and use the {} button to format it as code.Needs
D
53

Fixed version of Bryan_Rch's answer:

def format_bytes(size):
    # 2**10 = 1024
    power = 2**10
    n = 0
    power_labels = {0 : '', 1: 'kilo', 2: 'mega', 3: 'giga', 4: 'tera'}
    while size > power:
        size /= power
        n += 1
    return size, power_labels[n]+'bytes'
Doublespace answered 19/3, 2018 at 11:20 Comment(3)
This converts 12625 bytes into (12.3291015625, 'megabytes') instead it should be 0.01204 Megabytes am I missing someting?Enamour
@Enamour no, it correctly converts 12625 bytes in (12.3291015625, 'kilobytes')Doublespace
@Enamour not following you, please provide a complete example.Doublespace
S
38
def humanbytes(B):
    """Return the given bytes as a human friendly KB, MB, GB, or TB string."""
    B = float(B)
    KB = float(1024)
    MB = float(KB ** 2) # 1,048,576
    GB = float(KB ** 3) # 1,073,741,824
    TB = float(KB ** 4) # 1,099,511,627,776

    if B < KB:
        return '{0} {1}'.format(B,'Bytes' if 0 == B > 1 else 'Byte')
    elif KB <= B < MB:
        return '{0:.2f} KB'.format(B / KB)
    elif MB <= B < GB:
        return '{0:.2f} MB'.format(B / MB)
    elif GB <= B < TB:
        return '{0:.2f} GB'.format(B / GB)
    elif TB <= B:
        return '{0:.2f} TB'.format(B / TB)


tests = [1, 1024, 500000, 1048576, 50000000, 1073741824, 5000000000, 1099511627776, 5000000000000]

for t in tests: print("{0} == {1}".format(t,humanbytes(t)))

Output:

1 == 1.0 Byte
1024 == 1.00 KB
500000 == 488.28 KB
1048576 == 1.00 MB
50000000 == 47.68 MB
1073741824 == 1.00 GB
5000000000 == 4.66 GB
1099511627776 == 1.00 TB
5000000000000 == 4.55 TB

and for future me here it is in Perl too:

sub humanbytes {
   my $B = shift;
   my $KB = 1024;
   my $MB = $KB ** 2; # 1,048,576
   my $GB = $KB ** 3; # 1,073,741,824
   my $TB = $KB ** 4; # 1,099,511,627,776

   if ($B < $KB) {
      return "$B " . (($B == 0 || $B > 1) ? 'Bytes' : 'Byte');
   } elsif ($B >= $KB && $B < $MB) {
      return sprintf('%0.02f',$B/$KB) . ' KB';
   } elsif ($B >= $MB && $B < $GB) {
      return sprintf('%0.02f',$B/$MB) . ' MB';
   } elsif ($B >= $GB && $B < $TB) {
      return sprintf('%0.02f',$B/$GB) . ' GB';
   } elsif ($B >= $TB) {
      return sprintf('%0.02f',$B/$TB) . ' TB';
   }
}
Salaam answered 25/7, 2015 at 22:23 Comment(3)
what is this (0 == B > 1)? (0 == B) will return a boolean which is never > 1Limicoline
@AdamMarples: this is a rich comparison chain, which was introduced in PEP 207. It translates to: "use the plural version, if B is not exactly one (1)". You resolve it from left to right: if B is zero or B is greater than 1Lorenz
Thank you for the Perl version. Always nice to cf. to Python and see the influences.Cola
G
33

WARNING: other answers are likely to contain bugs. The ones posted before this one were unable to handle filesizes that are close to the boundary of the next unit.

Dividing bytes to get a human-readable answer may seem easy, right? Wrong!

Many answers are incorrect and contains floating point rounding bugs that cause incorrect output such as "1024 KiB" instead of "1 MiB". They shouldn't feel sad about it, though, since it's a bug that even Android's OS programmers had in the past, and tens of thousands of programmer eyes never noticed the bug in the world's most popular StackOverflow answer either, despite years of people using that old Java answer.

So what's the problem? Well, it's due to the way that floating point rounding works. A float such as "1023.95" will actually round up to "1024.0" when told to format itself as a single-decimal number. Most programmers don't think about that bug, but it COMPLETELY breaks the "human readable bytes" formatting. So their code thinks "Oh, 1023.95, that's fine, we've found the correct unit since the number is less than 1024", but they don't realize that it will get rounded to "1024.0" which SHOULD be formatted as the NEXT size-unit.

Furthermore, many of the other answers are using horribly slow code with a bunch of math functions such as pow/log, which may look "neat" but completely wrecks performance. Most of the other answers use crazy if/else nesting, or other performance-killers such as temporary lists, live string concatenation/creation, etc. In short, they waste CPU cycles doing pointless, heavy work.

Most of them also forget to include larger units, and therefore only support a small subset of the most common filesizes. Given a larger number, such code would output something like "1239213919393491123.1 Gigabytes", which is silly. Some of them won't even do that, and will simply break if the input number is larger than the largest unit they've implemented.

Furthermore, almost none of them handle negative input, such as "minus 2 megabytes", and completely break on such input.

They also hardcode very personal choices such as precision (how many decimals) and unit type (metric or binary). Which means that their code is barely reusable.

So... okay, we have a situation where the current answers aren't correct... so why not do everything right instead? Here's my function, which focuses on both performance and configurability. You can choose between 0-3 decimals, and whether you want metric (power of 1000) or binary (power of 1024) representation. It contains some code comments and usage examples, to help people understand why it does what it does and what bugs it avoids by working this way. If all the comments are deleted, it would shrink the line numbers by a lot, but I suggest keeping the comments when copypasta-ing so that you understand the code again in the future. ;-)

from typing import List, Union

class HumanBytes:
    METRIC_LABELS: List[str] = ["B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"]
    BINARY_LABELS: List[str] = ["B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"]
    PRECISION_OFFSETS: List[float] = [0.5, 0.05, 0.005, 0.0005] # PREDEFINED FOR SPEED.
    PRECISION_FORMATS: List[str] = ["{}{:.0f} {}", "{}{:.1f} {}", "{}{:.2f} {}", "{}{:.3f} {}"] # PREDEFINED FOR SPEED.

    @staticmethod
    def format(num: Union[int, float], metric: bool=False, precision: int=1) -> str:
        """
        Human-readable formatting of bytes, using binary (powers of 1024)
        or metric (powers of 1000) representation.
        """

        assert isinstance(num, (int, float)), "num must be an int or float"
        assert isinstance(metric, bool), "metric must be a bool"
        assert isinstance(precision, int) and precision >= 0 and precision <= 3, "precision must be an int (range 0-3)"

        unit_labels = HumanBytes.METRIC_LABELS if metric else HumanBytes.BINARY_LABELS
        last_label = unit_labels[-1]
        unit_step = 1000 if metric else 1024
        unit_step_thresh = unit_step - HumanBytes.PRECISION_OFFSETS[precision]

        is_negative = num < 0
        if is_negative: # Faster than ternary assignment or always running abs().
            num = abs(num)

        for unit in unit_labels:
            if num < unit_step_thresh:
                # VERY IMPORTANT:
                # Only accepts the CURRENT unit if we're BELOW the threshold where
                # float rounding behavior would place us into the NEXT unit: F.ex.
                # when rounding a float to 1 decimal, any number ">= 1023.95" will
                # be rounded to "1024.0". Obviously we don't want ugly output such
                # as "1024.0 KiB", since the proper term for that is "1.0 MiB".
                break
            if unit != last_label:
                # We only shrink the number if we HAVEN'T reached the last unit.
                # NOTE: These looped divisions accumulate floating point rounding
                # errors, but each new division pushes the rounding errors further
                # and further down in the decimals, so it doesn't matter at all.
                num /= unit_step

        return HumanBytes.PRECISION_FORMATS[precision].format("-" if is_negative else "", num, unit)

print(HumanBytes.format(2251799813685247)) # 2 pebibytes
print(HumanBytes.format(2000000000000000, True)) # 2 petabytes
print(HumanBytes.format(1099511627776)) # 1 tebibyte
print(HumanBytes.format(1000000000000, True)) # 1 terabyte
print(HumanBytes.format(1000000000, True)) # 1 gigabyte
print(HumanBytes.format(4318498233, precision=3)) # 4.022 gibibytes
print(HumanBytes.format(4318498233, True, 3)) # 4.318 gigabytes
print(HumanBytes.format(-4318498233, precision=2)) # -4.02 gibibytes

By the way, the hardcoded PRECISION_OFFSETS is created that way for maximum performance. We could have programmatically calculated the offsets using the formula unit_step_thresh = unit_step - (0.5/(10**precision)) to support arbitrary precisions. But it really makes NO sense to format filesizes with massive 4+ trailing decimal numbers. That's why my function supports exactly what people use: 0, 1, 2 or 3 decimals. Thus we avoid a bunch of pow and division math. This decision is one of many small attention-to-detail choices that make this function FAST. Another example of performance choices was the decision to use a string-based if unit != last_label check to detect the end of the List, rather than iterating by indices and seeing if we've reached the final List-index. Generating indices via range() or tuples via enumerate() is slower than just doing an address comparison of Python's immutable string objects stored in the _LABELS lists, which is what this code does instead!

Sure, it's a bit excessive to put that much work into performance, but I hate the "write sloppy code and only optimize after all the thousands of slow functions in a project makes the whole project sluggish" attitude. The "premature optimization" quote that most programmers live by is completely misunderstood and used as an excuse for sloppiness. :-P

I place this code in the public domain. Feel free to use it in your projects, both freeware and commercial. I actually suggest that you place it in a .py module and change it from a "class namespace" into a normal module instead. I only used a class to keep the code neat for StackOverflow and to make it easy to paste into self-contained python scripts if you don't want to use modules.

Enjoy and have fun! :-)

Guenevere answered 11/9, 2020 at 1:10 Comment(8)
Excellent effort. Would be great if this supported localization for unit names, and it would be a complete python package.Afflux
@Afflux Thanks for your kind words! Happy that this helped you! You can actually very easily localize already. Simply import the class into your personal file, and then add these lines to your personal python code: HumanBytes.METRIC_LABELS = ["B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"] and/or HumanBytes.BINARY_LABELS = ["B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"]. Simply write your own labels in there (don't edit the class itself). Be sure to keep the units and their order when you localize their names to your own alternative labels! Voila, personal labels! :-)Guenevere
Yes, of course, editing the labels (and maybe different format strings for different languages) will make it localized. But I was recommending you that if this functionality was built in here, you could bundle it up as a PyPI package and people would download it. Although even without localization, it is worth being in the pip repository.Afflux
1024 MiB is perfectly human readable IMO. Most people probably wouldn't expect that output, but I wouldn't consider it a bug. Your optimization unnecessarily makes the code longer and less comprehensible. You'll only format the sizes for humans to read, which means either displaying it somewhere (e.g. on the terminal, a GUI,...) or writing it to a file. Those actions are orders of magnitude slower than the formatting anyway and if speed matters you wouldn't bother formatting that string in the first place. BTW: if unit != last_label compares the strings; you probably meant is not ;-)Gunnel
Don't get me wrong - your code answers the question and it is very well thought through. But this doesn't make it the only correct answer (as your comments on many other answers and the bold first paragraph of your answer implies). Your code would be a great fit for a module on e.g. pypi, but I would always opt for one of the three-line answers that I (and my colleagues) can comprehend at a glance when implementing it myself.Gunnel
I've come across this very annoying issue myself before, and appreciate your attention to detail in addressing this :+1:Trail
Observation: 1023.95 bytes = 8,191.6 bits, mathematically, but isn't it really impossible to have six tenths of a bit?Mom
I've removed the overly shouty part and indicated that earlier answers may have been incorrect. Better or more up to date answers may be posted later. Correctness is of course one thing but we're using kB/GB etc. because we don't want to specify everything by the byte. Similarly, great performance may not be very important when you are going for human text consumption and they do the calculation faster than humans will be able to read it (computers can handle big numbers just dandy, thank you). That said, for a general purpose library or logging or similar, I'd probably use your code.Urticaceous
B
14

good idea for me:

def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    step_unit = 1000.0 #1024 bad the size

    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < step_unit:
            return "%3.1f %s" % (num, x)
        num /= step_unit
Batho answered 18/9, 2018 at 4:45 Comment(7)
most gentle solutionMikesell
Beautiful solution. Any reason you set step_unit as float? For convenience: a converted return line to fstring: return f"{num:.1f} {x}".Fluorosis
This code is pretty elegant but has many bugs and issues: 1. It uses Mac definitions (1000 instead of 1024), so filesizes will be wrong in the real world. step_unit = 1024 is the fix. 2. It will break (return None) if you have something larger than 1000 (or 1024 if code is fixed) terabytes. Adding a return statement after the for-loop is the fix. 3. There is no reason to use a float for step_unit. 4. It uses old-style string formatting ("" % () instead of "".format()). 5. It uses %3.1f as its formatter which is total nonsense. The 3 is useless. %.1f is the proper formatting.Guenevere
Yet another bug/issue: 6. When you provide a byte amount that's on the edge between two units (slightly less than the next unit), you will get an ugly result, due to it using the lower unit, but the float being rounded up to the next unit. For example (and assuming the code has been fixed to use 1024 as the step size), providing the number 1023.99999 MiB (in bytes) being rendered as 1024 MiB instead of as 1 GiB (the next unit). It happens when the byte-amount you provide is very close to the next unit but not close enough to be detected by the < step_unit check.Guenevere
Regarding issue 6: This is a common problem that programmers make. Nobody fixed it in the world's most copied StackOverflow code snippet either. Here's an article about that rounding bug: programming.guide/worlds-most-copied-so-snippet.htmlGuenevere
Discovered yet another issue: 7. Because of the repeated looping division, you accumulate float rounding errors each time, but they're small enough that they shouldn't give us any incorrect output results. But it's better to do a single division with the intended "power of 1024"... so the loop is elegant, but bad. -- I'll also clarify issue 1 regarding what I meant by "1024": The real world, such as Linux, Windows, all Websites, etc, use Binary filesizes such as KiB, MiB, etc, where each is a power of 1024. Using power of 1000 is silly, and only Macs do it.Guenevere
I've provided an answer which is fast and flexible and handles every formatting situation, with zero bugs or issues: https://mcmap.net/q/94108/-python-format-size-application-converting-b-to-kb-mb-gb-tbGuenevere
I
11

Yet another humanbytes version, with no loops/if..else, in python3 syntax.

Test numbers stolen from @whereisalext's answer.

Mind you, it's still a sketch, e.g. if the numbers are large enough it will traceback.

import math as m


MULTIPLES = ["B", "k{}B", "M{}B", "G{}B", "T{}B", "P{}B", "E{}B", "Z{}B", "Y{}B"]


def humanbytes(i, binary=False, precision=2):
    base = 1024 if binary else 1000
    multiple = m.trunc(m.log2(i) / m.log2(base))
    value = i / m.pow(base, multiple)
    suffix = MULTIPLES[multiple].format("i" if binary else "")
    return f"{value:.{precision}f} {suffix}"


if __name__ == "__main__":
    sizes = [
        1, 1024, 500000, 1048576, 50000000, 1073741824, 5000000000,
        1099511627776, 5000000000000]

    for i in sizes:
        print(f"{i} == {humanbytes(i)}, {humanbytes(i, binary=True)}")

Results:

1 == 1.00 B, 1.00 B
1024 == 1.02 kB, 1.00 kiB
500000 == 500.00 kB, 488.28 kiB
1048576 == 1.05 MB, 1.00 MiB
50000000 == 50.00 MB, 47.68 MiB
1073741824 == 1.07 GB, 1.00 GiB
5000000000 == 5.00 GB, 4.66 GiB
1099511627776 == 1.10 TB, 1.00 TiB
5000000000000 == 5.00 TB, 4.55 TiB

Update:

As pointed out in comments (and as noted originally: "Mind you, it's still a sketch"), this code is slow and buggy. Please see @mitch-mcmabers 's answer.

Update 2: I was also lying about having no ifs.

Impala answered 10/1, 2019 at 15:27 Comment(3)
Unfortunately, logarithm calculations are expensive. I checked with timeit and found that loops are faster.Persephone
Indeed, this is a good example of "short code but horrible code". Multiple calls to logarithm calculations, powers, truncation, etc. Very slow and wasteful stuff.Guenevere
That's a good phrase, I hope it's in the public domain, I have to use it! :)Impala
C
8

There is now a convenient DataSize package :

pip install datasize
import datasize
import sys

a = [i for i in range(1000000)]
s = sys.getsizeof(a)
print(f"{datasize.DataSize(s):MiB}")

Output :

8.2945556640625MiB

Chlorobenzene answered 23/4, 2021 at 10:11 Comment(3)
Note that this is abandonware with non-resolved issues, and that it does weird things if you e.g. try and format in bits instead of bytes.Urticaceous
@MaartenBodewes Thanks for mentioning it. Did you try it by yourself or did you just check github issues ? I did a quick try and issues #6, #10 & #11 seem fixed to me.Chlorobenzene
I'm entertaining a Python3 oriented rewrite from scratch. I'm not sure if that would be scary or comforting to people. How many different opinions can there be about whether it is better to fix your own bugs or those written by someone else? TBH, this use case is a bike shed.Hel
M
6

Using logarithms is probably the most concise way to do it:

from math import floor, log

def format_bytes(size):
  power = 0 if size <= 0 else floor(log(size, 1024))
  return f"{round(size / 1024 ** power, 2)} {['B', 'KB', 'MB', 'GB', 'TB'][int(power)]}"
Multiflorous answered 19/3, 2022 at 14:21 Comment(1)
This is a brilliant answer in principle but you made a mistake in the first line. logis the name of the function!Statute
T
5

I have quite readable function to convert bytes into greater units:

def bytes_2_human_readable(number_of_bytes):
    if number_of_bytes < 0:
        raise ValueError("!!! number_of_bytes can't be smaller than 0 !!!")

    step_to_greater_unit = 1024.

    number_of_bytes = float(number_of_bytes)
    unit = 'bytes'

    if (number_of_bytes / step_to_greater_unit) >= 1:
        number_of_bytes /= step_to_greater_unit
        unit = 'KB'

    if (number_of_bytes / step_to_greater_unit) >= 1:
        number_of_bytes /= step_to_greater_unit
        unit = 'MB'

    if (number_of_bytes / step_to_greater_unit) >= 1:
        number_of_bytes /= step_to_greater_unit
        unit = 'GB'

    if (number_of_bytes / step_to_greater_unit) >= 1:
        number_of_bytes /= step_to_greater_unit
        unit = 'TB'

    precision = 1
    number_of_bytes = round(number_of_bytes, precision)

    return str(number_of_bytes) + ' ' + unit
Therewith answered 24/5, 2016 at 20:59 Comment(0)
F
4

This functionality already exists in matplotlib.

>>> from matplotlib.ticker import EngFormatter
>>> fmt = EngFormatter('B')
>>> fmt(123456)
'123.456 kB'
Forearm answered 17/5, 2023 at 4:4 Comment(0)
B
3

Rather than modifying your code, you can change the behaviour of division:

from __future__ import division

This provides "true" division over the "classic" style that Python 2.x uses. See PEP 238 - Changing the Division Operator for more details.

This is now the default behaviour in Python 3.x

Benedic answered 21/9, 2012 at 3:3 Comment(1)
I'm so used to the way that other languages work that I keep forgetting about this. It's really going to bite me someday when I start using Python 3 more.Needs
E
3

A very simple solution would be:

SIZE_UNITS = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']

def get_readable_file_size(size_in_bytes):
    index = 0
    while size_in_bytes >= 1024:
        size_in_bytes /= 1024
        index += 1
    try:
        return f'{size_in_bytes} {SIZE_UNITS[index]}'
    except IndexError:
        return 'File too large'
Easton answered 2/10, 2019 at 12:56 Comment(0)
N
1

When you divide the value you're using an integer divide, since both values are integers. You need to convert one of them to float first:

return '%.1f' % float(b)/1000 + 'KB'

or even just

return '%.1f' % b/1000.0 + 'KB'
Needs answered 21/9, 2012 at 2:56 Comment(0)
G
1

This is a compact version that converts B (bytes) to any higher order such MB, GB without using a lot of if...else in python. I use bit-wise to deal with this. Also it allows to return a float output if you trigger the parameter return_output in the function as True:

import math

def bytes_conversion(number, return_float=False):

    def _conversion(number, return_float=False):

        length_number = int(math.log10(number))

        if return_float:

           length_number = int(math.log10(number))
           return length_number // 3, '%.2f' % (int(number)/(1 << (length_number//3) *10))

        return length_number // 3, int(number) >> (length_number//3) * 10

    unit_dict = {
        0: "B",  1: "kB",
        2: "MB", 3: "GB",
        4: "TB", 5: "PB",
        6: "EB"
    }

    if return_float:

        num_length, number = _conversion(number, return_float=return_float)

    else:
        num_length, number = _conversion(number)

    return "%s %s" % (number, unit_dict[num_length])

#Example usage:
#print(bytes_conversion(491266116, return_float=True))

This is only a few of my posts in StackOverflow. Please let me know if I have any errors or violations.

Genitals answered 20/8, 2018 at 4:40 Comment(0)
P
1

I have improved, in my opininion, @whereisalext answer to have a somewhat more generic function which does not require one to add more if statements once more units are going to be added:

AVAILABLE_UNITS = ['bytes', 'KB', 'MB', 'GB', 'TB']

def get_amount_and_unit(byte_amount):
    for index, unit in enumerate(AVAILABLE_UNITS):
        lower_threshold = 0 if index == 0 else 1024 ** (index - 1)
        upper_threshold = 1024 ** index
        if lower_threshold <= byte_amount < upper_threshold:
            if lower_threshold == 0:
                return byte_amount, unit
            else:
                return byte_amount / lower_threshold, AVAILABLE_UNITS[index - 1]
    # Default to the maximum
    max_index = len(AVAILABLE_UNITS) - 1
    return byte_amount / (1024 ** max_index), AVAILABLE_UNITS[max_index]

Do note that this differs slightly frrom @whereisalext's algo:

  • This returns a tuple containing the converted amount at the first index and the unit at the second index
  • This does not try to differ between a singular and multiple bytes (1 bytes is therefore an output of this approach)
Plead answered 23/6, 2019 at 13:4 Comment(0)
S
1

I think this is a short and succinct. The idea is based on some graph scaling code I wrote many years ago. The code snippet round(log2(size)*4)/40 does the magic here, calculating the boundaries with an increment with the power of 2**10. The "correct" implementation would be: trunc(log2(size)/10, however then you would get strange behavior when the size is close to a new boundary. For instance datasize(2**20-1) would return (1024.00, 'KiB'). By using round and scaling the log2result you get a nice cutof when approaching a new boundary.

from math import log2
def datasize(size):
    """
    Calculate the size of a code in B/KB/MB.../
    Return a tuple of (value, unit)
    """
    assert size>0, "Size must be a positive number"
    units = ("B", "KiB", "MiB", "GiB", "TiB", "PiB",  "EiB", "ZiB", "YiB") 
    scaling = round(log2(size)*4)//40
    scaling = min(len(units)-1, scaling)
    return  size/(2**(10*scaling)), units[scaling]

for size in [2**10-1, 2**10-10, 2**10-100, 2**20-10000, 2**20-2**18, 2**20, 2**82-2**72, 2**80-2**76]:
    print(size, "bytes= %.3f %s" % datasize(size))

1023 bytes= 0.999 KiB
1014 bytes= 0.990 KiB
924 bytes= 924.000 B
1038576 bytes= 0.990 MiB
786432 bytes= 768.000 KiB
1048576 bytes= 1.000 MiB
4830980911975647053611008 bytes= 3.996 YiB
1133367955888714851287040 bytes= 0.938 YiB
Scanty answered 2/12, 2020 at 10:29 Comment(0)
C
1

Let me add mine, where no variable is updated in a loop or similar error-prone behaviors. The logic implemented is straightforward. It's tested only with Python 3.

def format_bytes(size: int) -> str:
    power_labels = {40: "TB", 30: "GB", 20: "MB", 10: "KB"}
    for power, label in power_labels.items():
        if size >= 2 ** power:
            approx_size = size // 2 ** power
            return f"{approx_size} {label}"
    return f"{size} bytes"

It's tested, for example at KB/MB boundary:

  • 1024*1024-1 returns "1023 KB"
  • 1024*1024 returns "1 MB"
  • 1024*1024+1 returns "1 MB"

You can easily change approx_size if you want float instead of rounded integers.

Caprification answered 3/12, 2021 at 8:14 Comment(0)
H
0

Do float(b) before do dividing, e.g. do float(b)/1000 instead of float(b/1000), because both b and 1000 are integers, b/1000 is still an integer without decimal part.

Housemaid answered 21/9, 2012 at 2:55 Comment(0)
A
0

Here is to convert bytes to kilo, mega, tera.

#From bytes to kilo, mega, tera
def  get_(size):

    #2**10 = 1024
    power = 2**10
    n = 1
    Dic_powerN = {1:'kilobytes', 2:'megabytes', 3:'gigabytes', 4:'Terabytes'}

    if size <= power**2 :
        size /=  power
        return size, Dic_powerN[n]

    else: 
        while size   >  power :
            n  += 1
            size /=  power**n

        return size, Dic_powerN[n]
Allaround answered 6/5, 2017 at 20:14 Comment(1)
This is (nice but) wrong - try get_(1.1 * 10**9)Doublespace
P
0

An output with no decimal places:

>>> format_file_size(12345678)
'11 MiB, 792 KiB, 334 bytes'

format_file_size(
    def format_file_size(fsize):
        result = []
        units = {s: u for s, u in zip(reversed([2 ** n for n in range(0, 40, 10)]), ['GiB', 'MiB', 'KiB', 'bytes'])}
        for s, u in units.items():
            t = fsize // s
            if t > 0:
                result.append('{} {}'.format(t, u))
            fsize = fsize % s
        return ', '.join(result) or '0 bytes'
Phonologist answered 8/10, 2018 at 21:46 Comment(0)
S
0

I know there already are a lot of answers and explanations here, but I tried this class based method and it perfectly worked for me. It may seem enormous but just take a look at how I used the attributes and methods.

class StorageUnits:
    b, Kb, Kib, Mb, Mib, Gb, Gib, Tb, Tib, Pb, Pib, Eb, Eib, Zb, Zib, Yb, Yib, B, KB, KiB, MB, MiB, GB, GiB, TB,\
        TiB, PB, PiB, EB, EiB, ZB, ZiB, YB, YiB = [0]*34


class DigitalStorageConverter:
    def __init__(self):
        self.storage = StorageUnits()
        self.bit_conversion_value_table = {
            'b': 1, 'Kb': 1000, 'Mb': 1000**2, 'Gb': 1000**3, 'Tb': 1000**4, 'Pb': 1000**5, 'Eb': 1000**6,
            'Zb': 1000**7, 'Yb': 1000**8, 'Kib': 1024, 'Mib': 1024**2, 'Gib': 1024**3, 'Tib': 1024**4, 'Pib': 1024**5,
            'Eib': 1024**6, 'Zib': 1024**7, 'Yib': 1024**8,
            'B': 8, 'KB': 8*1000, 'MB': 8*(1000**2), 'GB': 8*(1000**3), 'TB': 8*(1000**4), 'PB': 8*(1000**5),
            'EB': 8*(1000**6), 'ZB': 8*(1000**7), 'YB': 8*(1000**8), 'KiB': 8*1024, 'MiB': 8*(1024**2),
            'GiB': 8*(1024**3), 'TiB': 8*(1024**4), 'PiB': 8*(1024**5), 'EiB': 8*(1024**6), 'ZiB': 8*(1024**7),
            'YiB': 8*(1024**8)
        }
        "Values of all the units in bits"
        self.name_conversion_table = {
            'bit': 'b', 'kilobit': 'Kb', 'megabit': 'Mb', 'gigabit': 'Gb', 'terabit': 'Tb', 'petabit': 'Pb',
            'exabit': 'Eb', 'zettabit': 'Zb', 'yottabit': 'Yb', 'kibibit': 'Kib', 'mebibit': 'Mib', 'Gibibit': 'Gib',
            'tebibit': 'Tib', 'pebibit': 'Pb', 'exbibit': 'Eib', 'zebibit': 'Zib', 'yobibit': 'Yib',
            'byte': 'B', 'kilobyte': 'KB', 'megabyte': 'MB', 'gigabyte': 'GB', 'terabyte': 'TB', 'petabyte': 'PB',
            'exabyte': 'EB', 'zettabyte': 'ZB', 'yottabyte': 'YB', 'kibibyte': 'KiB', 'mebibyte': 'MiB',
            'gibibyte': 'GiB', 'tebibyte': 'TiB', 'pebibyte': 'PiB', 'exbibyte': 'EiB', 'zebibyte': 'ZiB',
            'yobibyte': 'YiB'
        }
        self.storage_units = [u for u in list(StorageUnits.__dict__.keys()) if not u.startswith('__')]

    def get_conversion(self, value: float, from_type: str) -> StorageUnits:
        if from_type in list(self.name_conversion_table.values()):
            from_type_bit_value = self.bit_conversion_value_table[from_type]
        elif from_type in list(self.name_conversion_table.keys()):
            from_type = self.name_conversion_table[from_type]
            from_type_bit_value = self.bit_conversion_value_table[from_type]
        else:
            raise KeyError(f'Invalid storage unit type "{from_type}"')

        value = value * from_type_bit_value

        for i in self.storage_units:
            self.storage.__setattr__(i, value / self.bit_conversion_value_table[i])
        return self.storage


if __name__ == '__main__':
    c = DigitalStorageConverter()
    s = c.get_conversion(5000, 'KiB')
    print(s.KB, s.MB, s.TB)   # , ..., ..., etc till whatever you may want

This program will give you answers in exponent form if the number is too big.

NOTE: Please correct the names of the storage values, if anywhere found incorrect

Sheol answered 20/4, 2021 at 13:51 Comment(0)
L
0
def resize(size: int | float, from_: str = "KB", to_: str = "B"):
    sizes = ("PB", "TB", "GB", "MB", "KB", "B")
    unit = sizes.index(to_.upper()) - sizes.index(from_.upper())

    return size // (1024 ** abs(unit)) if unit < 0 else size ** (1024 * abs(unit))
Laughton answered 1/10, 2022 at 9:18 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Churchwell
C
0

Here are my 2 cents/lines:

nextPrefix = {'B':'KB','KB':'MB','MB':'GB','GB':'TB','TB':'PB','b':'Kb','Kb':'Mb','Mb':'Gb','Gb':'Tb','Tb':'Pb'}
formatSize = lambda i, s: f'{i:.2f}{s}' if i<1024 or s not in nextPrefix else formatSize(i/1024,nextPrefix[s])

Usage example:

print(formatSize(1000000,'Kb'))
Capful answered 14/1, 2024 at 17:16 Comment(0)
F
0

The convert_bytes function converts in 'bytes', 'KB', 'MB', 'GB', 'TB'

def convert_bytes(num):
    step_unit = 1000.0 
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < step_unit:
            return "%3.1f %s" % (num, x)
        num /= step_unit
Forespeak answered 6/3, 2024 at 20:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.