File System Implementation and Memory Allocation - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

File System Implementation and Memory Allocation

Share This

The main challenge in implementing file storage is remembering which disk blocks correspond to which file. Different operating systems employ different techniques. 


Contiguous Allocation


The most straightforward allocation strategy is to store each file as a sequence of consecutive disk blocks. A 50 KB file would therefore be allotted 50 consecutive 1-KB blocks on a disk containing 1-KB blocks. It would be given 25 consecutive blocks of 2-KB size.

The first 40 disk blocks are displayed in the accompanying image, starting with block 0 on the left. 

The disk was initially empty. Then, starting with block 0, a file A of length four blocks was written to disk. Following that, a six-block file named B was written, beginning immediately after file A's conclusion. 

Because each file starts at the beginning of a new block, if file A were to actually be 312 blocks long, some space would be lost at the end of the previous block. Seven files are represented in the figure, each of which begins at the block that follows the end of the previous one. 

Just to make it simpler to distinguish between the files, shading is employed.

Contiguous disk space allocation has two significant advantages. 

  • First, it is simple to implement because keeping track of where a file’s blocks are is reduced to remembering two numbers: the disk address of the first block and the number of blocks in the file.  
  • Second, the read performance is excellent because the entire file can be read from the disk in a single operation. 

Unfortunately, contiguous allocation also has a significant drawback: 

  • Blocks in a file are released upon removal, leaving a run of free blocks on the disk. The disk finally consists of files and holes, as shown in the figure, because it cannot be compressed immediately to remove the hole because doing so would require copying all the blocks after the hole, which may amount to millions of blocks.

Initially, this fragmentation is not a problem since each new file can be written at the end of the disk, following the previous one. However, eventually, the disk will fill up and it will become necessary to either compact the disk, which is prohibitively expensive or to reuse the free space in the holes. Reusing the space requires maintaining a list of holes, which is doable. However, when a new file is to be created, it is necessary to know its final size in order to choose a hole of the correct size to place it in.

Imagine the consequences of such a design. The user starts a text editor or word processor in order to type a document. The first thing the program asks is how many bytes the final document will be. The question must be answered or the program will not continue. If the number given ultimately proves too small the program has to terminate prematurely because the disk hole is full and there is no place to put the rest of the file. If the user tries to avoid this problem by giving an unrealistically large number as the final size, say, 100 MB, the editor may be unable to find such a large hole and announce that the file cannot be created. Of course, the user would be free to start the program again and say 50 MB this time, and so on until a suitable hole was located. Still, this scheme is not likely to lead to happy users.

However, there is one situation in which contiguous allocation is feasible and, in fact, widely used: on CD-ROMs. Here all the file sizes are known in advance and will never change during subsequent use of the CD-ROM file system. 

Contiguous allocation was actually used on magnetic disk file systems years ago due to its simplicity and high performance (user-friendliness did not count for much then). Then the idea was dropped due to the nuisance of having to specify the final file size at file creation time. But with the advent of CD-ROMs, DVDs, and other write-once optical media, suddenly contiguous files are a good idea again. It is thus, important to study old systems and ideas that were conceptually clean and simple because they may be applicable to future systems in surprising ways.


Linked List Allocation


The second method for storing files is to keep each one as a linked list of disk blocks, as shown in Fig. 6-13. The first word of each block is used as a pointer to the next one. The rest of the block is for data.


Unlike contiguous allocation, every disk block can be used in this method. No space is lost to disk fragmentation (except for internal fragmentation in the last block). Also, if is sufficient for the directory entry to merely store the disk address of the first block. The rest can be found starting there.

On the other hand, although reading a file sequentially is straightforward, random access is extremely slow. To get to block n, the operating system has to start at the beginning and read the n – 1 blocks prior to it, one at a time. Clearly, doing so many reads will be painfully slow.

Also, the amount of data storage in a block is no longer a power of two because the pointer takes up a few bytes. While not fatal, having a peculiar size is less efficient because many programs read and write in blocks whose size is a power of two. With the first few bytes of each block occupied to a pointer to the next block, reads of the full block size require acquiring and concatenating information from two disk blocks, which generates extra overhead due to the copying.


Linked List Allocation Using a Table in Memory


Both disadvantages of the linked list allocation can be eliminated by taking the pointer word from each disk block and putting it in a table in memory. The following figure shows what the table looks like for the example of the previous figure. In both figures, we have two files. File A uses disk blocks 4, 7, 2, 10, and 12, in that order, and file B uses disk blocks 6, 3, 11, and 14, in that order. Using the table, we can start with block 4 and follow the chain all the way to the end. The same can be done starting with block 6. Both chains are terminated with a special marker (e.g., –1) that is not a valid block number. Such a table in the main memory is called a FAT (File Allocation Table).

Using this organization, the entire block is available for data. Furthermore, random access is much easier. Although the chain must still be followed to find a given offset within the file, the chain is entirely in memory, so it can be followed without making any disk references. Like the previous method, it is sufficient for the directory entry to keep a single integer (the starting block number) and still be able to locate all the blocks, no matter how large the file is.

The primary disadvantage of this method is that the entire table must be in memory all the time to make it work. With a 20-GB disk and a 1-KB block size, the table needs 20 million entries, one for each of the 20 million disk blocks. Each entry has to be a minimum of 3 bytes. For speed in lookup, they should be 4 bytes. Thus the table will take up 60 MB or 80 MB of main memory all the time, depending on whether the system is optimized for space or time. Conceivably the table could be put in pageable memory, but it would still occupy a great deal of virtual memory and disk space as well as generate extra paging traffic.

I-nodes


Our last method for keeping track of which blocks belong to which file is to associate with each file a data structure called an i-node (index-node), which lists the attributes and disk addresses of the file blocks. A simple example is depicted in the following figure. Given the i-node, it is then possible to find all the blocks of the file. The big advantage of this scheme over linked files using an in-memory table is that the i-node need only be in memory when the corresponding file is open. If each i-node occupies n bytes and a maximum of k files may be open at once, the total memory occupied by the array holding the i-nodes for the open files is only kn bytes. Only this much space need be reserved in advance.

This array is usually far smaller than the space occupied by the file table described in the previous section. The reason is simple. The table for holding the linked list of all disk blocks is proportional in size to the disk itself. If the disk has n blocks, the table needs n entries. As disks grow larger, this table grows linearly with them. In contrast, the i-node scheme requires an array in memory whose size is proportional to the maximum number of files that may be open at once. It does not matter if the disk is 1 GB or 10 GB or 100 GB.

One problem with i-nodes is that if each one has room for a fixed number of disk addresses, what happens when a file grows beyond this limit? One solution is to reserve the last disk address not for a data block, but instead for the address of a block containing more disk block addresses, as shown in figure. Even more advanced would be two or more such blocks containing disk addresses or even disk blocks pointing to other disk blocks full of addresses. We will come back to i-nodes when studying UNIX later.

Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.