Why is stat::st_size 0 for devices but at the same time lseek defines the device size correctly?
Asked Answered
P

5

16

I noticed that when I query the size of a device using open + lseek, everything is OK, but when I stat the device, I get zero instead of the real device size. The device is clean without any file system and the first bytes of device start with some text like "1234567890ABC". What is wrong?

The code:

#include <sys/stat.h>
#include <dirent.h>

bool
GetFileSize(const char* pPath, uint64_t& Size)
{
    pPath = "/home/sw/.bashrc";
    pPath = "/dev/sda";

    struct stat buffer;
    if (stat(pPath, &buffer))
    {
        printf("Failed to stat file. Error: %s. FilePath: %s\n", strerror(errno), pPath);
        return false;
    }

    printf("File size by stat: %" PRIu64 " WTF?\n", buffer.st_size);

    //
    // Note: It's strange, but stat::st_size from the stat call is zero for devices
    //

    int File = open(pPath, O_RDONLY);
    if (File < 0)
    {
        printf("Failed to open file. Error: %s. FilePath: %s\n", strerror(errno), pPath);
        return false;
    }

    long off = lseek(File, 0, SEEK_END);
    if (off == (off_t)-1)
    {
        printf("Failed to get file size. Error: %s. FilePath: %s\n", strerror(errno), pPath);
        close(File);
        return false;
    }
    close(File);

    printf("File size by lseek: %" PRIu64 "\n", off);
    fflush(stdout);

    Size = off;
    return true;
}

Output:

File size by stat: 0 Huh?
File size by lseek: 34359738368

If I use stat for a regular file then everything is OK (comment out the line with "/dev/sda"):

File size by stat: 4019 Huh?
File size by lseek: 4019
Pallium answered 14/3, 2019 at 14:0 Comment(3)
welcome to "stat information is different from seeking+telling information".Brotherson
related: unix.stackexchange.com/questions/384488/…Brotherson
I don’t trust printing the size using PRIu64 unless you cast the size to uint64_t. That said, you’d probably not get zero if it’s going wrong.Bohannon
C
14

The devil is in the detail... For starters, there is the fundamental principle of Unix design: everything is a file, Nicely explained here.

The second is that the stat(2) call is giving you inode statistics stored on the filesystem about the device-special file which has a size of zero (think of it as lstat(2)). If you have a block-device that has a filesystem on it you get information about it using statfs(2) or getfsstat(2) or statvfs(2) in a filesystem/device independent way.

Dealing with special files (usually residing in /dev) has always been system specific and the manual pages reside in section 4. So if you want to manipulate a device directly you should read up on the specifics there. For instance, in Linux man 4 hd will show you how to programmatically interact with IDE block devices. Whereas man 4 sd will give you how to interact with scsi discs, etc.

Third thing, system calls are not supposed to be inconsistent in their functionality NOR their limitations.

Hope this has helped.

Creeps answered 14/3, 2019 at 14:23 Comment(3)
it has, and I have the impression that the 3 existing answers complete each other.Brotherson
@Jean-FrançoisFabre they seem to don't they :)Creeps
@Jean-François Fabre Masud All answers are correct and complete each other. But this think seems is key: stat works with inodes and it gives information from existing FS, not device file. But in other side device file has size, has start and has the end (in case of block device) therefore lseek works without problem and returns correct device size. As Ahmed told "everything is a file", therefore I had used stat in place where it cannot be used. "The devil is in the detail.." - accurately describes this situation. Thank you guys!Pallium
C
10

from this Unix Stack Exchange question:

Device files are not files per se. They're an I/O interface to use the devices in Unix-like operating systems. They use no space on disk, however, they still use an inode as reported by the stat command:

$ stat /dev/sda
      File: /dev/sda
      Size: 0               Blocks: 0          IO Block: 4096   block special file
Device: 6h/6d   Inode: 14628       Links: 1     Device type: 8,0

That solves the stat part.

the fact that you can seek in this "file" is not related. This isn't really a file, but you can open it and read from it. You can seek to it too. It allows to read the disk at the lowest level, so seeking is necessary (that's why it works, and why wouldn't it return the new position like any "real" file?).

According to this other UnixSE answer, you can get the device size by reading this /dev/sda/size file.

Cooney answered 14/3, 2019 at 14:13 Comment(0)
D
7

The length of a "device" such as /dev/sda is not specifed by the POSIX struct stat:

off_t st_size       For regular files, the file size in bytes. 

                    For symbolic links, the length in bytes of the 
                    pathname contained in the symbolic link. 

                    For a shared memory object, the length in bytes. 

                    For a typed memory object, the length in bytes. 

                    For other file types, the use of this field is 
                    unspecified. 

So POSIX has no requirement for the "size" of a disk device.

Linux likewise does not specify that stat() shall return the size of a disk device:

st_size

This field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symbolic link is the length of the pathname it contains, without a terminating null byte.

Dockage answered 14/3, 2019 at 14:16 Comment(0)
I
0

On Linux, the documented way to get the size of a raw disk device that you can open is with the BLKGETSIZE ioctl. See the sd(4) manpage.

Note that this returns the size of the device in sectors. You might think that, for size in bytes, you have to multiply by the value returned by the BLKSSZGET ioctl, but if I'm reading the source code correctly, you actually have to multiply by 512 no matter what BLKSSZGET returns.

Islek answered 14/3, 2019 at 17:1 Comment(0)
W
-1

lseek is the backbone to C's fseek, so it has similar semantics, matching fseek - and quite detached from other areas of the Unix API. Provenance-wise, you'd expect lseek to act like file-handle-taking fseek, and fseek is a C-library interface that came to be without being Unix-specific.

stat is Unix-specific, though, and does its own thing. It's a reasonable difference to expect if you think about provenance. Of course the problem is, then, that C APIs have very weak type models because C is one step short of making true type safety possible.

Why is this important? Because, fundamentally, a seekable_size and a file_object_size are two fundamentally different concepts, and would demand different types – even the C++ standard library gets it wrong.

But while in C++ and with modern compilers it’s now an entirely gratuitous legacy shortcoming, there’s really no way in C to efficiently wrap integers into incompatible types without killing performance and code readability. And thus you end up with something like offs_t or long being used for wholly incompatible concepts. And this is the source of confusion: just because you get a size-related number out of a file-related function doesn’t mean that the number will have the same meaning. And meanings are usually captured in types… The only meaning a long inherently has is “hey, I’m a number, you can do numeric things with me”… :(

Wysocki answered 14/3, 2019 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.