Data Structures for Databases:Data Structures for Disk Space Management

Data Structures for Disk Space Management

Placing data items on disc is usually performed at different logical granularities. The most basic items found in relational or object-oriented database systems are the values of attributes. They consist of one or several bytes and are represented by fields. Fields, in turn, are put together in collections called records, which correspond to tuples or objects. Records need to be stored in physical blocks (see Section 60.3). A collection of records that forms a relation or the extent of a class is stored in some useful way as a collection of blocks, called a file.

Record Organizations

A collection of field names and their corresponding data types constitute a record format or record type. The data type of a field is usually one of the standard data types (e.g., integer, float, bool, date, time). If all records in a file have the same size in bytes, we call them fixed-length records. The fields of a record all have a fixed length and are stored consecutively. If the base address, i.e., the start position, of a record is given, the address of a specific field can be calculated as the sum of the lengths of the preceding fields. The sum assigned to each field is called the offset of this field. Record and field information are stored in the data dictionary. Figure 60.11 illustrates this record organization.


Fixed-length records are easy to manage and allow the use of efficient search methods. But this implies that all fields have a size so that all data items that potentially are to be stored may find space. This can lead to a waste of disk space and to more unfavorable access times.

If we assume that each record of a file has the same, fixed number of fields, a variable- length record can only be formed if some fields have a variable length. For example, a string representing the name of an employee can have a varying length in different records. Different data structures exist for implementing variable-length records. A first possible organization amounts to a consecutive sequence of fields which are interrupted by separators (such as ? or % or $). Separators are special symbols that do not occur in data items. A special terminator symbol indicates the end of the record. But this organization requires a pass (scan) of the record to be able to find a field of interest (Figure 60.12).

Instead of separators, each field of variable length can also start with a counter that specifies the needed number of bytes of a field value.

Another alternative is that a header precedes the record. A header represents the “ad- ministrative” part of the record and can include information about integer offsets of the


beginnings of the field values (Figure 60.12). The ith integer number is then the start address of the ith field value relatively to the beginning of the record. Also for the end of the record we must store an offset in order to know the end of the last field value. This alternative is usually the better one. Costs arise due to the header in terms of storage; the benefit is direct field access. Problems arise with changes. An update can let a field value grow which necessitates a “shift” of all consecutive fields. Besides, it can happen that a modified record does not fit any more on the page assigned to it and has to be moved to another page. If record identifiers contain a page number, on this page the new page number has to be left behind pointing to the new location of the record.

A further problem of variable-length records arises if such a record grows to such an extent that it does not fit on a page any more. For example, field values storing image data in various formats (e.g., GIF or JPEG), movies in formats such as MPEG, or spatial objects such as polygons can extend from the order of many kilobytes to the order of many megabytes or even gigabytes. Such truly large values for records or field values of records are called large objects (lobs) with the distinction of binary large objects (blobs ) for large byte sequences and character large objects!character (clobs ) for large strings.

Since, in general, lobs exceed page borders, only the non-lob fields are stored on the original page. Different data structures are conceivable for representing lobs. They all have in common that a lob is subdivided into a collection of linked pages. This organization is also called spanned , because records can span more than one page, in contrast to the unspanned organization where records are not allowed to cross page borders. The first alternative is to keep a pointer instead of the lob on the original page as attribute value. This pointer (also called page reference) points to the start of a linked page or block list keeping the lob (Figure 60.13(a)).

Insertions, deletions, and modifications are simple but direct access to pages is impossible. The second alternative is to store a lob directory as attribute value (Figure 60.13(b)). Instead of a pointer, a directory is stored which includes the lob size, further administrative data, and a page reference list pointing to the single pages or blocks on a disk. The main benefit of this structure is the direct and sequential access to pages. The main drawback is the fixed and limited size of the lob directory and thus the lob. A lob directory can grow so much that it needs itself a lob for its storage.

The third alternative is the usage of positional B+ –trees (Figure 60.14).

Such a B-tree

variant stores relative byte positions in its inner nodes as separators. Its leaf nodes keep the actual data pages of the lob. The original page only stores as the field value a pointer to the root of the tree.

Page Organizations

Records are positioned on pages (or blocks). In order to reference a record, often a pointer to it suffices. Due to different requirements for storing records, the structure of pointers can vary. The most obvious pointer type is the physical address of a record on disk or in a virtual storage and can easily be used to compute the page to be read. The main advantage


is a direct access to the searched record. But it is impossible to move a record within a page, because this requires the locating and changing of all pointers to this record. We call these pointers physical pointers. Due to this drawback, a pointer is often described as a pair (p, n) where p is the number of the page where the record can be found and where n is a number indicating the location of the record on the page. The parameter n can be interpreted differently, e.g., as a relative byte address on a page, as a number of a slot, or as an index of a directory in the page header . The entry at this index position yields the relative position of the record on the page. All pointers (s, p) remain unchanged and are named page-related pointers. Pointers that are completely stable against movements in main memory can be achieved if a record is associated with a logical address that reveals nothing about its storage. The record can be moved freely in a file without changing any pointers. This can be realized by indirect addressing. If a record is moved, only the respective entry in a translation table has to be changed. All pointers remain unchanged, and we call them logical pointers . The main drawback is that each access to a record needs an additional access to the translation table. Further, the table can become so large that it does not fit completely in main memory.

A page can be considered as a collection of slots. Each slot can capture exactly one record. If all records have the same length, all slots have the same size and can be allocated consecutively on the page. Hence, a page contains so many records as slots fit on a page plus page information like directories and pointers to other pages. A first alternative for arranging a set of N fixed-length records is to place them in the first N slots (see Figure 60.15). If a record is deleted in slot i < N , the last record on the page in slot N is moved to the free slot i. However, this causes problems if the record to be moved is pinnedand the slot number has to be changed. Hence, this “packed” organization is problematic, although it allows one to easily compute the location of the ith record. A second alternative is to manage deletions of records on each page and thus information about free slots by means of a directory represented as a bitmap. The retrieval of the ith record as well as finding the next free slot on a page require a traversal of the directory. The search for the next free slot can be sped up if an additional, special field stores a pointer on the first slot whose deletion flag is set. The slot itself then contains a pointer to the next free slot so that a chaining of free slots is achieved.


Also variable-length records can be positioned consecutively on a page. But deletions of records must be handled differently now because slots cannot be reused in a simple manner any more. If a new record is to be inserted, first a free slot of “the right size” has to be found. If the slot is too large, storage is wasted. If it is too small, it cannot be used. In any case, unused storage areas (fragmentation) at the end of slots can only be avoided if records on a page are moved and condensed. This leads to a connected, free storage area. If the records of a page are unpinned, the “packed” representation for fixed-length records can be adapted. Either a special terminator symbol marks the end of the record, or a field at the beginning of the record keeps its length. In the general case, indirect addressing is needed which permits record movements without negative effects and without further access costs. The most flexible organization of variable-length records is provided by the tuple identifier (TID ) concept (Figure 60.16).

Each record is assigned a unique, stable pointer consisting of a page number and an index into a page-internal directory. The entry at index i contains the relative position, i.e., the offset, of slot i and hence a pointer to record i on the page.

The length information of a record is stored either in the directory entry or at the beginning of the slot (Li in Figure 60.16). Records which grow or shrink can be moved on the page without being forced to modify their TIDs. If a record is deleted, this is registered in the corresponding directory entry by means of a deletion flag.

Since a page cannot be subdivided into predefined slots, some kind of free disk space management is needed on each page. A pointer to the beginning of the free storage space on the page can be kept in the page header. If a record does not fit into the currently


available free disk space, the page is compressed (i.e., defragmented) and all records are placed consecutively without gaps. The effect is that the maximally available free space is obtained and is located after the record representations.

If, despite defragmentation, a record does still not fit into the available free space, the record must be moved from its “home page” to an “overflow page”. The respective TID can be kept stable by storing a “proxy TID” instead of the record on the home page. This proxy TID points to the record having been moved to the overflow page. An overflow record is not allowed to be moved to another, second overflow page. If an overflow record has to leave its overflow page, its placement on the home page is attempted. If this fails due to a lack of space, a new overflow page is determined and the overflow pointer is placed on the home page. This procedure assures that each record can be retrieved with a maximum of two page accesses.

If a record is deleted, we can only replace the corresponding entry in the directory by a deletion flag. But we cannot compress the directory since the indexes of the directory are used to identify records. If we deleted an entry and compress, the indexes of the subsequent slots in the directory would be decremented so that TIDs would point to wrong slots and thus wrong records. If a new record is inserted, the first entry of the directory containing a deletion flag is selected for determining the new TID and pointing to the new record.

If a record represents a large object, i.e., it does not fit on a single page but requires a collection of linked pages, the different data structures for blobs can be employed.

File Organization

A file (segment ) can be viewed as a sequence of blocks (pages ). Four fundamental file organizations can be distinguished, namely files of unordered records (heap files), files of ordered records (sorted files), files with dispersed records (hash files), and tree-based files (index structures).

Heap files are the simplest file organization. Records are inserted and stored in their un- ordered, chronological sequence. For each heap file we have to manage their assigned pages (blocks) to support scans as well as the pages containing free space to perform insertions efficiently. Doubly-linked lists of pages or directories of pages using both page numbers for page addressing are possible alternatives. For the first alternative, the DBMS uses a header page which is the first page of a heap file, contains the address of the first data page, and information about available free space on the pages. For the second alternative, the DBMS must keep the first page of the heap file in mind. The directory itself represents a collection of pages and can be organized as a linked list. Each directory entry points to a page of the heap file. The free space on each page is recorded by a counter associated with each directory entry. If a record is to be inserted, its length can be compared to the number of free bytes on a page.

Sorted files physically order their records based on the values of one (or several) of their fields, called the ordering field (s). If the ordering field is also a key field of the file, i.e., a field guaranteed to have a unique value in each record, then the field is called the ordering key for the file. If all records have the same fixed length, binary search on the ordering key can be employed resulting in faster access to records.

Hash files are a file organization based on hashing and representing an important indexing technique. They provide very fast access to records on certain search conditions. Internal hashing techniques have been discussed in different chapters of this book; here we are dealing with their external variants and will only explain their essential features. The fundamental idea of hash files is the distribution of the records of a file into so-called buckets, which are organized as heaps. The distribution is performed depending on the value of the search key.

The direct assignment of a record to a bucket is computed by a hash function. Each bucket consists of one or several pages of records. A bucket directory is used for the management of the buckets, which is an array of pointers. The entry for index i points to the first page of bucket i. All pages for bucket i are organized as a linked list. If a record has to be inserted into a bucket, this is usually done on its last page since only there space can be found. Hence, a pointer to the last page of a bucket is used to accelerate the access to this page and to avoid traversing all the pages of the bucket. If there is no space left on the last page, overflow pages are provided. This is called a static hash file. Unfortunately, this strategy can cause long chains of overflow pages. Dynamic hash files deal with this problem by allowing a variable number of buckets. Extensible hash files employ a directory structure in order to support insertion and deletion efficiently without the employment of overflow pages. Linear hash files apply an intelligent strategy to create new buckets. Insertion and deletion are efficiently realized without using a directory structure.

Index structures are a fundamental and predominantly tree-based file organization based on the search key property of values and aiming at speeding up the access to records. They have a paramount importance in query processing. Many examples of index structures are already described in detail in this handbook, e.g., B-trees and variants, quad-trees and octtrees, R-trees and variants, and other multidimensional data structures. We will not discuss them further here. Instead, we mention some basic and general organization forms for index structures that can also be combined. An index structure is called a primary organization if it contains search key information together with an embedding of the respective records; it is named a secondary organization if it includes besides search key information only TIDs or TID lists to records in separate file structures (e.g., heap files or sorted files). An index is called a dense index if it contains (at least) one index entry for each search key value which is part of a record of the indexed file; it is named a sparse index (Figure 60.17) if it only contains an entry for each page of records of the indexed file. An index is called a clustered index (Figure 60.17) if the logical order of records is equal or almost equal to their physical order, i.e., records belonging logically together are physically stored on neighbored pages. Otherwise, the index is named non-clustered . An index is called a one-dimensional index if a linear order is defined on the set of search key values used for organizing the index entries. Such an order cannot be imposed on a multi-dimensional index where the organization of index entries is based on spatial relationships. An index is called a single-level index if the index only consists of a single file; otherwise, if the index is composed of several files, it is named a multi-level index .


Related posts:

Leave a comment

Your email address will not be published. Required fields are marked *