Difference between blocks and sectors

Asked 10/9, 2012 at 5:27 Answered 20/4 at 22:48

With reference to this article: Hard Drive Knowledge: Blocks vs. Sectors, there is a line that reads:

Because there are limits to the number of blocks, or drive addresses, that an operating system can address. By defining a block as several sectors, an OS can work with bigger hard drives without increasing the number of block addresses.

What does it mean? What is meant by "operating system can address"? And the subsequent math isn't clear either. How can 64512 be less than 644?

Compliancy answered 10/9, 2012 at 5:27 Comment(4)

Link is dead, could you update ? Else, question looses some value... – Squeteague 20/9, 2016 at 7:58

Seems this link has a clone of that content: alphaurax-computer.com/computer-tips/… – Cruickshank 30/11, 2016 at 23:56

About the maths... The article is incorrect. When it talks about "64k * 512k vs 64k * 4k" it should read "64k * 512 vs 64k * 4k" (note that 512 dropped the k, so really it is that 64k*4k is "greater than" 64k*512. – Cruickshank 1/12, 2016 at 0:1

Link is dead again. I now updated with waybackmachine's archive. – Wheeze 28/11, 2023 at 11:36

Look at it this way. Every block that's used in your operating system's file system to store data requires a certain amount of metadata to be stored along with the actual file data you're writing. e.g: timestamps (created, modified), filename, ownership/permission bits. For files that span multiple blocks, you also have to store the IDs of each of those blocks and the order they're chained together, etc.

Determining block size in an OS is a case of tradeoffs. Every file must occupy at least one block, even if the file is 0 bytes long, so there's something for the file's metadata to be attached to. Unless you can guarantee that your files will ALWAYS be some multiple of the block size in size (e.g. in a 4k block OS, all files are 4k), there will be a certain amount of wastage for the files that don't exactly fit within that block.

Small block sizes are good when you need to store many small files. On the other hand, more blocks = more metadata, so you end up wasting a chunk of your storage system on overhead, tracking the location of all the files.

On the flip side, large blocks mean less metadata, but also mean greater wastage when you're storing small files. e.g. a 1 byte file stored in a 4k block wastes 3.99k of that block.

Each of those blocks must be given an ID number by the OS, so it can be uniquely identified. An OS which uses an 8 bit ID field can track only 256 blocks, and therefore, by extension, only 256 files. But if each of those blocks is actually 1 megabyte in size, then you can store up to 256 megabytes of data.

The article you link to has a typo/logical flaw: they meant 512 BYTES, not 512k, so 64*512 bytes is smaller than 64*4k, aka 64*4096 bytes. Most hard drives shipped with 512 byte sector/block sizes.

However, as discussed earlier, small blocks mean more metadata. With drive sizes now in the 3+ terabyte range, with 512 byte blocks, you had to have metadata storage for 3TB/512 bytes = 6.44 billion blocks. That's one major waste of space. So now they ship drives with 4k blocks, 8 times larger, so you only need metadata storage for 805 million blocks. The total number of possible files has been cut by a factor of 8, but the reduced amount of metadata means you can actually store a larger amount of useable data.

Incidentally, 6.4 billion blocks is larger than what can be addressed directly by a 32bit system. 2^32 has an upper limit of ~4.2 billion, so older 32bit machines could not use the entirety of a 3TB drive. Hence switching to larger block sizes. 32bit boxes can easily handle 805 million blocks.

Eight answered 10/9, 2012 at 5:45 Comment(6)

Helluva explanation! Thanks! Just one question, did you mean that the blocks contain the metadata themselves, that is, if a file spans 10 blocks, will each block, along with some data, will contain metadata like it's id, it's serial number in the block chain, etc? – Compliancy 10/9, 2012 at 6:22

And another question, what is meant by this line there: By defining a block as several sectors, an OS can work with bigger hard drives without increasing the number of block addresses.? – Compliancy 10/9, 2012 at 6:25

sectors are an obsolete concept in modern drives. They existed when "locations" on a drive were specified by the old CHS (cylinder, head, sector) definition, which wasted a lot of space. All modern drives use LBA - logical block addressing, so sectors don't really exist anymore. However, an OS can still chain multiple blocks/sector into a single logical OS-level block to reduce space. E.g. "every 3 real blocks/sectors on the drive will be considered 1 block by the os". – Eight 10/9, 2012 at 14:55

as well, the block metadata is stored elsewhere. e.g in DOS/Win9x, that was the 'FAT' - file allocation table. On Unix-ish systems, it's the inode table, etc... It wasn't exactly 1 block of file data = 1 block of metadata. But everytime you used a block somewhere for a file, the usage of that block had to be recorded elsewhere in another block - but the OS could store the data for multiple file blocks in a single metadata block, so the overhead wasn't too horendous – Eight 10/9, 2012 at 14:57

So basically for a large number of small files, since there will be lots of metadata, lots of blocks will have to be sacrificed for storing that metadata, right? – Compliancy 10/9, 2012 at 17:49

This is a spectacularly bad answer, outdated by decades. NTFS does not work AT ALL like this. NTFS keeps a hidden db (MFT) with all top-details about files/dirs/etc. MFT does not waste a block for every file. In fact, if a file is well under 1 KB, its whole content is kept right in MFT w/o wasting a single dedicated block for this small file. Test it! Create a .txt doc whose content is just 1 character, save the file and check its properties in Explorer. It will say Size = 1 byte but Size on disk = 0 bytes because the latter only counts blocks 100% dedicated to the file. – Midship 20/4 at 21:33

Even though the referred article seems to have been posted close to when the question was asked (i.e. ~2013), the article itself seems extremely outdated even for that time.

When a disk is formatted, tracks are defined

This is only true once in a modern HDD's lifetime (i.e. newer than about 30 years), when it's formatted at the factory and all its tracks and sectors are defined. What regular people know as formatting, as done via Windows or any other OS, is just a logical reorganization done at the partition and file system level. This kind of formatting does NOT redefine/rewrite physical/hardware tracks and sectors. These hardware operations are normally no longer possible (or needed) in modern drives once they leave the factory.

A block, on the other hand, is a group of sectors that the operating system can address (point to)

Correct. All generally available modern storage devices use a fixed-block architecture, meaning they make their capacity available as a huge number of small blocks, each with an unique id but all having the same capacity (for historical reasons, the 512 bytes/block size has been around for a long time but since 2010 HDDs have transitioned to a larger size of 4096 bytes/block). There is another key feature these devices have: any read or write against a block is always done against the block as a whole. The operation is never done on just part of a block.

When an OS talks to a storage device, the device only understands the concept of block and nothing else. The scheme used to address blocks is called LBA and it's essentially very simple: A numeric id is assigned to each block, starting from 0 and ending to however many blocks are available - 1. Any read/write operation submitted to the storage device must somehow specify the id(s) of the block(s) to be involved, in addition to whatever data is transferred.

With modern HDDs, practically all internal details about how blocks are mapped to physical structures are hidden from the outside. Inside the HDD, a block does indeed map to at least one sector but you need to check the HDD specs to find out if it's just one or more. For example, 512e HDDs were released post-2010 that internally used 4096 bytes/sector but presented themselves as if they used 512 bytes/sector. These were necessary in legacy computers that couldn't be upgraded to handle 4K HDDs natively for various exotic reasons.

So, the [48-bit] LBA scheme allows directly addressing 2^48 blocks which is a huge number. If the block size is 512 bytes this means LBA can address devices with a max capacity of 2^48 * 512 = 128 PiB (often incorrectly called just PB) = 131072 TiB (often incorrectly called just TB)! With 4096 bytes/sector that capacity shoots up eight fold!!

So why are there blocks. Why doesn't the operating system just point straight to the sectors? Because there are limits to the number of blocks, or drive addresses, that an operating system can address.

Everything said here (as well as after) in the article is outdated by 40+ years. Yes, eons ago there was a need for the OS to manage the relation between a block size and a sector size, to compensate for the limitations described in there but this is no longer the case because, as already said:

Modern HDDs don't allow anyone to look inside and see/set what happens behind each block and
LBA is very generous and allows directly addressing a gigantic numbers of blocks, far beyond what's going to be found inside any generally available HDD anytime soon.

However, those early days had consequences on how file systems are built. A layer of abstraction is presented by the file system to users, such that the file system presents its own allocation unit (i.e. a minimum "block" that must be used when you want to store anything) whose size can differ from the size of the blocks used by the storage devices backing said file system behind the scenes. In case of NTFS, the allocation unit is also known as cluster size.

In general, using an allocation unit that's different from default (which in turn may or may NOT be identical to the block size of the underlying storage device(s)) is something to be done only when you expertly know why. For example, a 4K allocation unit (cluster size) may be beneficial even with HDDs presenting 512 bytes/block when the HDD is used in certain patterns but not in others.

Midship answered 20/4 at 22:48 Comment(0)

Recommended topics

Hot tags