I was going through google file system (GFS) paper, It mentions that GFS uses Lazy space allocation to reduce internal fragmentation.
Can someone explain, how lazy space reduces internal fragmetation?
With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated. In other words, the decision process that precedes the allocation of a new chunk on disk, is heavily influenced by the size of the data that is to be written. This preference of waiting instead of allocating more chunks based on some other characteristic, minimizes the chance of internal fragmentation (i.e. unused portions of the 64 MB chunk).
In the Google paper, it also says: "Most chunks are full because most files contain many chunks, only the last of which may be partially filled." So, the same approach is applied to file creation.
It is analogous to this: http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
I have not read the entire paper..but I am hoping that the following fragment should help you in a small way.
The first question I would ask is: what is the effect of having large block sizes in a file system? Let us say that FS block size is 64MB. Good news is that we write in good contiguous chunks to hard disks (more data written per seek), less metadata to keep in indirect blocks, etc. Bad news is internal fragmentation..if the file is 1MB, but minimum block size is 64MB, there is Internal fragmentation of 63MB. So, how to get the good news and avoid the bad news?
One way is to do lazy space allocation OR delayed space allocation. Here, we keep the block size small (say 1MB), but we write a big chumk of data i.e. many 1MB chunks together when we write to disk. This way, we get the goodness of large block writes. Note that this means that we write to an incore buffer but tell the write() sys call that it is done writing to disk...just like writing to the buffer cache.
NOTE: When the "time" has come to do the real block allocation, we need to be guaranteed space on disk. So, delayed block allocation => space reservation is done at the time of write, but space allocation is done at a later time when enough data blocks have accumulated in-core.
Data is first written into a buffer. So, instead of allocating memory the moment the file is created, they are waiting till the actual write occurs. As in XFS http://en.wikipedia.org/wiki/XFS#Delayed_allocation
You don't have to fix the file size on creating. And you can append it to a larger file. You can reference this.
© 2022 - 2024 — McMap. All rights reserved.