Find size and free space of the filesystem containing a given file
Asked Answered
N

12

92

I'm using Python 2.6 on Linux. What is the fastest way:

  • to determine which partition contains a given directory or file?

    For example, suppose that /dev/sda2 is mounted on /home, and /dev/mapper/foo is mounted on /home/foo. From the string "/home/foo/bar/baz" I would like to recover the pair ("/dev/mapper/foo", "home/foo").

  • and then, to get usage statistics of the given partition? For example, given /dev/mapper/foo I would like to obtain the size of the partition and the free space available (either in bytes or approximately in megabytes).

Nullification answered 23/11, 2010 at 19:42 Comment(3)
Are you taking symlinks into account? While you may have /home and /mnt/somedisk , /home/foo/x may be a symlink to directory /mnt/somedisk/xyzzy - so it appears under /home, but actually lives at /mnt/somediskLeatherleaf
@Piskvor: No - for the time being I don't need to follow symlinks, they're just plain directories. The first question is basically asking "find the closest ancestor directory that has a partition mounted on it".Nullification
See also #3274854Pesky
L
51

If you just need the free space on a device, see the answer using os.statvfs() below.

If you also need the device name and mount point associated with the file, you should call an external program to get this information. df will provide all the information you need -- when called as df filename it prints a line about the partition that contains the file.

To give an example:

import subprocess
df = subprocess.Popen(["df", "filename"], stdout=subprocess.PIPE)
output = df.communicate()[0]
device, size, used, available, percent, mountpoint = \
    output.split("\n")[1].split()

Note that this is rather brittle, since it depends on the exact format of the df output, but I'm not aware of a more robust solution. (There are a few solutions relying on the /proc filesystem below that are even less portable than this one.)

Lachish answered 23/11, 2010 at 19:53 Comment(18)
Specifically he could do import commands, then commands.getoutput("df filename | tail -1 | gawk '{ print $6 }' ")Theurich
The commands module is superseded by subprocess. And I would not do the output parsing in bash when I can do it in Python :)Lachish
I didn't know about the "filename" argument to df. "df -B MB filename" will do. Thanks a lot.Nullification
@Federico Ramponi: It even dereferences the symlinks, and tells you on which spinning piece of rusted metal the bits actually reside. Magic!Leatherleaf
The OP can do all necessary steps in python, by substituting a dependency on a file (/proc/mounts) for the dependency on an external program (df in your case).Physoclistous
@İsmail 'cartman' Dönmez: Please read the original question again. statvfs won't give all the information needed. You have to rely on either /proc/mounts or /bin/df. Parsing the output of the latter is much easier and makes the call to statvfs redundant.Lachish
Parsing /proc/mounts is better than running a binary and parsing its output, IMHO.Bruch
@İsmail 'cartman' Dönmez: You not only have to parse /proc/mounts, you have to figure out what partition your path belongs to yourself. This includes canonising the path. The code will become much longer and more error-prone than the code I provided. You are free to prefer it, but this does not seem to be a well-reasoned downvote to me.Lachish
@İsmail: while I prefer (obviously :) my solution to Sven's, I think that his answer is also helpful (that's what the up/down vote means) and a valid alternative to mine, and it does not merit a -1. Now, when someone takes my answer and adds an alternative to /proc/mounts (specifically, using ctypes.cdll.LoadLibrary("libc.so.6").getmntent), I'd definitely give them a +1 !Physoclistous
I agree about your reasoning sir, I am taking back my downvote, but still this is hacky ;) [Yeah well my vote is locked, I'll take it back]Bruch
@İsmail: if Sven edits his question, you can take back your downvote.Physoclistous
Ah well if Sven can do that I'd be happy :>Bruch
this method does not always work. In my environment, the output consumes more than one line. In that case the script gets ValueError('need more than 5 values to unpack', because device column and other infomations is in the different lines.Antelope
@Antelope This answer is for Linux and df from GNU coreutils specifically. If you don't need the device name and the mount point, please use the code from the next answer.Lachish
When scraping user facing commands, be aware output can change based on the environment's language settings. Different operating systems format things differently even for standard ('POSIX') commands. Don't expect portability here.Interweave
@JeffK My code doesn't even invoke a shell, so the risk of shell execution vulnerabilities is non-existent. And I know it's problematic to parse the output of df, but there isn't a really nice solution in this case. Most people can probably get away with os.statvfs(), but the OP asked for the device name, which you will only get by either parsing /proc/mounts or by calling df, neither of which is very portable.Lachish
@Sven I deleted my concerns about shell vulnerabilities after reading Popen's documentation more closely. I'm very happy that you commented about it as a casual googler might be unaware of the dangers you correctly avoided.Interweave
Maybe the use of --output option could ease the parsing: LANG=C df --output=avail /pathMady
P
146

This doesn't give the name of the partition, but you can get the filesystem statistics directly using the statvfs Unix system call. To call it from Python, use os.statvfs('/home/foo/bar/baz').

The relevant fields in the result, according to POSIX:

unsigned long f_frsize   Fundamental file system block size. 
fsblkcnt_t    f_blocks   Total number of blocks on file system in units of f_frsize. 
fsblkcnt_t    f_bfree    Total number of free blocks. 
fsblkcnt_t    f_bavail   Number of free blocks available to 
                         non-privileged process.

So to make sense of the values, multiply by f_frsize:

import os
statvfs = os.statvfs('/home/foo/bar/baz')

statvfs.f_frsize * statvfs.f_blocks     # Size of filesystem in bytes
statvfs.f_frsize * statvfs.f_bfree      # Actual number of free bytes
statvfs.f_frsize * statvfs.f_bavail     # Number of free bytes that ordinary users
                                        # are allowed to use (excl. reserved space)
Pesky answered 8/9, 2012 at 3:56 Comment(2)
I just had this fail on me on an embedded system with ubifs. It resulted in 100MB free where only 10 was available. I'm unsure where the 100 came from.Endear
Why is the answers with the most votes (by a mile) the 6th result that StackOverflow lists?Perbunan
S
56

As of Python 3.3, there an easy and direct way to do this with the standard library:

$ cat free_space.py 
#!/usr/bin/env python3

import shutil

total, used, free = shutil.disk_usage(__file__)
print(total, used, free)

$ ./free_space.py 
1007870246912 460794834944 495854989312

These numbers are in bytes. See the documentation for more info.

Sophronia answered 30/9, 2016 at 20:39 Comment(0)
L
51

If you just need the free space on a device, see the answer using os.statvfs() below.

If you also need the device name and mount point associated with the file, you should call an external program to get this information. df will provide all the information you need -- when called as df filename it prints a line about the partition that contains the file.

To give an example:

import subprocess
df = subprocess.Popen(["df", "filename"], stdout=subprocess.PIPE)
output = df.communicate()[0]
device, size, used, available, percent, mountpoint = \
    output.split("\n")[1].split()

Note that this is rather brittle, since it depends on the exact format of the df output, but I'm not aware of a more robust solution. (There are a few solutions relying on the /proc filesystem below that are even less portable than this one.)

Lachish answered 23/11, 2010 at 19:53 Comment(18)
Specifically he could do import commands, then commands.getoutput("df filename | tail -1 | gawk '{ print $6 }' ")Theurich
The commands module is superseded by subprocess. And I would not do the output parsing in bash when I can do it in Python :)Lachish
I didn't know about the "filename" argument to df. "df -B MB filename" will do. Thanks a lot.Nullification
@Federico Ramponi: It even dereferences the symlinks, and tells you on which spinning piece of rusted metal the bits actually reside. Magic!Leatherleaf
The OP can do all necessary steps in python, by substituting a dependency on a file (/proc/mounts) for the dependency on an external program (df in your case).Physoclistous
@İsmail 'cartman' Dönmez: Please read the original question again. statvfs won't give all the information needed. You have to rely on either /proc/mounts or /bin/df. Parsing the output of the latter is much easier and makes the call to statvfs redundant.Lachish
Parsing /proc/mounts is better than running a binary and parsing its output, IMHO.Bruch
@İsmail 'cartman' Dönmez: You not only have to parse /proc/mounts, you have to figure out what partition your path belongs to yourself. This includes canonising the path. The code will become much longer and more error-prone than the code I provided. You are free to prefer it, but this does not seem to be a well-reasoned downvote to me.Lachish
@İsmail: while I prefer (obviously :) my solution to Sven's, I think that his answer is also helpful (that's what the up/down vote means) and a valid alternative to mine, and it does not merit a -1. Now, when someone takes my answer and adds an alternative to /proc/mounts (specifically, using ctypes.cdll.LoadLibrary("libc.so.6").getmntent), I'd definitely give them a +1 !Physoclistous
I agree about your reasoning sir, I am taking back my downvote, but still this is hacky ;) [Yeah well my vote is locked, I'll take it back]Bruch
@İsmail: if Sven edits his question, you can take back your downvote.Physoclistous
Ah well if Sven can do that I'd be happy :>Bruch
this method does not always work. In my environment, the output consumes more than one line. In that case the script gets ValueError('need more than 5 values to unpack', because device column and other infomations is in the different lines.Antelope
@Antelope This answer is for Linux and df from GNU coreutils specifically. If you don't need the device name and the mount point, please use the code from the next answer.Lachish
When scraping user facing commands, be aware output can change based on the environment's language settings. Different operating systems format things differently even for standard ('POSIX') commands. Don't expect portability here.Interweave
@JeffK My code doesn't even invoke a shell, so the risk of shell execution vulnerabilities is non-existent. And I know it's problematic to parse the output of df, but there isn't a really nice solution in this case. Most people can probably get away with os.statvfs(), but the OP asked for the device name, which you will only get by either parsing /proc/mounts or by calling df, neither of which is very portable.Lachish
@Sven I deleted my concerns about shell vulnerabilities after reading Popen's documentation more closely. I'm very happy that you commented about it as a casual googler might be unaware of the dangers you correctly avoided.Interweave
Maybe the use of --output option could ease the parsing: LANG=C df --output=avail /pathMady
P
27
import os

def get_mount_point(pathname):
    "Get the mount point of the filesystem containing pathname"
    pathname= os.path.normcase(os.path.realpath(pathname))
    parent_device= path_device= os.stat(pathname).st_dev
    while parent_device == path_device:
        mount_point= pathname
        pathname= os.path.dirname(pathname)
        if pathname == mount_point: break
        parent_device= os.stat(pathname).st_dev
    return mount_point

def get_mounted_device(pathname):
    "Get the device mounted at pathname"
    # uses "/proc/mounts"
    pathname= os.path.normcase(pathname) # might be unnecessary here
    try:
        with open("/proc/mounts", "r") as ifp:
            for line in ifp:
                fields= line.rstrip('\n').split()
                # note that line above assumes that
                # no mount points contain whitespace
                if fields[1] == pathname:
                    return fields[0]
    except EnvironmentError:
        pass
    return None # explicit

def get_fs_freespace(pathname):
    "Get the free space of the filesystem containing pathname"
    stat= os.statvfs(pathname)
    # use f_bfree for superuser, or f_bavail if filesystem
    # has reserved space for superuser
    return stat.f_bfree*stat.f_bsize

Some sample pathnames on my computer:

path 'trash':
  mp /home /dev/sda4
  free 6413754368
path 'smov':
  mp /mnt/S /dev/sde
  free 86761562112
path '/usr/local/lib':
  mp / rootfs
  free 2184364032
path '/proc/self/cmdline':
  mp /proc proc
  free 0

PS

if on Python ≥3.3, there's shutil.disk_usage(path) which returns a named tuple of (total, used, free) expressed in bytes.

Physoclistous answered 27/12, 2010 at 9:57 Comment(1)
As noted above: I just had this method using statvfs fail on me on an embedded system with ubifs. It resulted in 100MB free where only 10 was available. I'm unsure where the 100 came from.Endear
F
16

This should make everything you asked:

import os
from collections import namedtuple

disk_ntuple = namedtuple('partition',  'device mountpoint fstype')
usage_ntuple = namedtuple('usage',  'total used free percent')

def disk_partitions(all=False):
    """Return all mountd partitions as a nameduple.
    If all == False return phyisical partitions only.
    """
    phydevs = []
    f = open("/proc/filesystems", "r")
    for line in f:
        if not line.startswith("nodev"):
            phydevs.append(line.strip())

    retlist = []
    f = open('/etc/mtab', "r")
    for line in f:
        if not all and line.startswith('none'):
            continue
        fields = line.split()
        device = fields[0]
        mountpoint = fields[1]
        fstype = fields[2]
        if not all and fstype not in phydevs:
            continue
        if device == 'none':
            device = ''
        ntuple = disk_ntuple(device, mountpoint, fstype)
        retlist.append(ntuple)
    return retlist

def disk_usage(path):
    """Return disk usage associated with path."""
    st = os.statvfs(path)
    free = (st.f_bavail * st.f_frsize)
    total = (st.f_blocks * st.f_frsize)
    used = (st.f_blocks - st.f_bfree) * st.f_frsize
    try:
        percent = ret = (float(used) / total) * 100
    except ZeroDivisionError:
        percent = 0
    # NB: the percentage is -5% than what shown by df due to
    # reserved blocks that we are currently not considering:
    # http://goo.gl/sWGbH
    return usage_ntuple(total, used, free, round(percent, 1))


if __name__ == '__main__':
    for part in disk_partitions():
        print part
        print "    %s\n" % str(disk_usage(part.mountpoint))

On my box the code above prints:

giampaolo@ubuntu:~/dev$ python foo.py 
partition(device='/dev/sda3', mountpoint='/', fstype='ext4')
    usage(total=21378641920, used=4886749184, free=15405903872, percent=22.9)

partition(device='/dev/sda7', mountpoint='/home', fstype='ext4')
    usage(total=30227386368, used=12137168896, free=16554737664, percent=40.2)

partition(device='/dev/sdb1', mountpoint='/media/1CA0-065B', fstype='vfat')
    usage(total=7952400384, used=32768, free=7952367616, percent=0.0)

partition(device='/dev/sr0', mountpoint='/media/WB2PFRE_IT', fstype='iso9660')
    usage(total=695730176, used=695730176, free=0, percent=100.0)

partition(device='/dev/sda6', mountpoint='/media/Dati', fstype='fuseblk')
    usage(total=914217758720, used=614345637888, free=299872120832, percent=67.2)
Fogle answered 18/6, 2011 at 16:54 Comment(3)
Also, take a look at this recipe: code.activestate.com/recipes/577972-disk-usagePedantry
A minor nitpick - all is a built-in function and should not be used as a variable in a function.Aglow
Can this be represented in Gigabytes ?Katey
U
10

The simplest way to find out it.

import os
from collections import namedtuple

DiskUsage = namedtuple('DiskUsage', 'total used free')

def disk_usage(path):
    """Return disk usage statistics about the given path.

    Will return the namedtuple with attributes: 'total', 'used' and 'free',
    which are the amount of total, used and free space, in bytes.
    """
    st = os.statvfs(path)
    free = st.f_bavail * st.f_frsize
    total = st.f_blocks * st.f_frsize
    used = (st.f_blocks - st.f_bfree) * st.f_frsize
    return DiskUsage(total, used, free)
Undaunted answered 6/8, 2015 at 13:11 Comment(2)
used = total - free ?Promotive
When I run this example above, I get an error about a missing attribute: AttributeError: module 'os' has no attribute 'statvfs'. What am I doing wrong?Engels
C
9

For the second part of your question, "get usage statistics of the given partition", psutil makes this easy with the disk_usage(path) function. Given a path, disk_usage() returns a named tuple including total, used, and free space expressed in bytes, plus the percentage usage.

Simple example from documentation:

>>> import psutil
>>> psutil.disk_usage('/')
sdiskusage(total=21378641920, used=4809781248, free=15482871808, percent=22.5)

Psutil works with Python versions from 2.6 to 3.6 and on Linux, Windows, and OSX among other platforms.

Chelate answered 15/12, 2017 at 0:55 Comment(0)
P
6

For the first point, you can try using os.path.realpath to get a canonical path, check it against /etc/mtab (I'd actually suggest calling getmntent, but I can't find a normal way to access it) to find the longest match. (to be sure, you should probably stat both the file and the presumed mountpoint to verify that they are in fact on the same device)

For the second point, use os.statvfs to get block size and usage information.

(Disclaimer: I have tested none of this, most of what I know came from the coreutils sources)

Philbo answered 23/11, 2010 at 20:36 Comment(2)
re getmntent: well, there's always the possibility of import ctypes; ctypes.cdll.LoadLibrary("libc.so.6").getmntent, but it's not that straightforward…Physoclistous
I'm curious as to why this got a downvote, a comment would have been appreciatedPhilbo
C
6
import os

def disk_stat(path):
    disk = os.statvfs(path)
    percent = (disk.f_blocks - disk.f_bfree) * 100 / (disk.f_blocks -disk.f_bfree + disk.f_bavail) + 1
    return percent


print disk_stat('/')
print disk_stat('/data')
Concede answered 3/3, 2017 at 5:58 Comment(2)
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.City
disk_stat method doesn't take any arguments. But, the idea to use os.statvfs is good.Acklin
D
3

11 years later but expanding on others answers.

import psutil

#File systems
value=psutil.disk_partitions()

for i in value:
    va=i[1]
    value2=psutil.disk_usage(va).percent
    print(value2)
    fs_space[va]=value2

This is adding it to a dictionary, only grabbing percent as that is what I need, but you can grab all values or select the one you want from the total, used, free, or percent.

Official documentation helped a lot

Diller answered 30/12, 2021 at 22:15 Comment(0)
G
2

Checking the disk usage on your Windows PC can be done as follows:

import psutil

fan = psutil.disk_usage(path="C:/")
print("Available: ", fan.total/1000000000)
print("Used: ", fan.used/1000000000)
print("Free: ", fan.free/1000000000)
print("Percentage Used: ", fan.percent, "%")
Gombosi answered 24/5, 2020 at 16:31 Comment(1)
It should work on any platform. It is working on a MacBook M2 Darwin. Take a look at psutil.readthedocs.io/en/latest/#disksHub
P
1

Usually the /proc directory contains such information in Linux, it is a virtual filesystem. For example, /proc/mounts gives information about current mounted disks; and you can parse it directly. Utilities like top, df all make use of /proc.

I haven't used it, but this might help too, if you want a wrapper: http://bitbucket.org/chrismiles/psi/wiki/Home

Peridot answered 23/11, 2010 at 19:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.