Placing data items on disc is usually performed at diﬀerent logical granularities. The most basic items found in relational or object-oriented database systems are the values of attributes. They consist of one or several bytes and are represented by ﬁelds. Fields, in turn, are put together in collections called records, which correspond to tuples or objects. Records need to be stored in physical blocks (see Section 60.3). A collection of records that forms a relation or the extent of a class is stored in some useful way as a collection of blocks, called a ﬁle.
A collection of ﬁeld names and their corresponding data types constitute a record format or record type. The data type of a ﬁeld is usually one of the standard data types (e.g., integer, ﬂoat, bool, date, time). If all records in a ﬁle have the same size in bytes, we call them ﬁxed-length records. The ﬁelds of a record all have a ﬁxed length and are stored consecutively. If the base address, i.e., the start position, of a record is given, the address of a speciﬁc ﬁeld can be calculated as the sum of the lengths of the preceding ﬁelds. The sum assigned to each ﬁeld is called the oﬀset of this ﬁeld. Record and ﬁeld information are stored in the data dictionary. Figure 60.11 illustrates this record organization.
Fixed-length records are easy to manage and allow the use of eﬃcient search methods. But this implies that all ﬁelds have a size so that all data items that potentially are to be stored may ﬁnd space. This can lead to a waste of disk space and to more unfavorable access times.
If we assume that each record of a ﬁle has the same, ﬁxed number of ﬁelds, a variable- length record can only be formed if some ﬁelds have a variable length. For example, a string representing the name of an employee can have a varying length in diﬀerent records. Diﬀerent data structures exist for implementing variable-length records. A ﬁrst possible organization amounts to a consecutive sequence of ﬁelds which are interrupted by separators (such as ? or % or $). Separators are special symbols that do not occur in data items. A special terminator symbol indicates the end of the record. But this organization requires a pass (scan) of the record to be able to ﬁnd a ﬁeld of interest (Figure 60.12).
Instead of separators, each ﬁeld of variable length can also start with a counter that speciﬁes the needed number of bytes of a ﬁeld value.
Another alternative is that a header precedes the record. A header represents the “ad- ministrative” part of the record and can include information about integer oﬀsets of the
beginnings of the ﬁeld values (Figure 60.12). The ith integer number is then the start address of the ith ﬁeld value relatively to the beginning of the record. Also for the end of the record we must store an oﬀset in order to know the end of the last ﬁeld value. This alternative is usually the better one. Costs arise due to the header in terms of storage; the beneﬁt is direct ﬁeld access. Problems arise with changes. An update can let a ﬁeld value grow which necessitates a “shift” of all consecutive ﬁelds. Besides, it can happen that a modiﬁed record does not ﬁt any more on the page assigned to it and has to be moved to another page. If record identiﬁers contain a page number, on this page the new page number has to be left behind pointing to the new location of the record.
A further problem of variable-length records arises if such a record grows to such an extent that it does not ﬁt on a page any more. For example, ﬁeld values storing image data in various formats (e.g., GIF or JPEG), movies in formats such as MPEG, or spatial objects such as polygons can extend from the order of many kilobytes to the order of many megabytes or even gigabytes. Such truly large values for records or ﬁeld values of records are called large objects (lobs) with the distinction of binary large objects (blobs ) for large byte sequences and character large objects!character (clobs ) for large strings.
Since, in general, lobs exceed page borders, only the non-lob ﬁelds are stored on the original page. Diﬀerent data structures are conceivable for representing lobs. They all have in common that a lob is subdivided into a collection of linked pages. This organization is also called spanned , because records can span more than one page, in contrast to the unspanned organization where records are not allowed to cross page borders. The ﬁrst alternative is to keep a pointer instead of the lob on the original page as attribute value. This pointer (also called page reference) points to the start of a linked page or block list keeping the lob (Figure 60.13(a)).
Insertions, deletions, and modiﬁcations are simple but direct access to pages is impossible. The second alternative is to store a lob directory as attribute value (Figure 60.13(b)). Instead of a pointer, a directory is stored which includes the lob size, further administrative data, and a page reference list pointing to the single pages or blocks on a disk. The main beneﬁt of this structure is the direct and sequential access to pages. The main drawback is the ﬁxed and limited size of the lob directory and thus the lob. A lob directory can grow so much that it needs itself a lob for its storage.
The third alternative is the usage of positional B+ –trees (Figure 60.14).
Such a B-tree
variant stores relative byte positions in its inner nodes as separators. Its leaf nodes keep the actual data pages of the lob. The original page only stores as the ﬁeld value a pointer to the root of the tree.
Records are positioned on pages (or blocks). In order to reference a record, often a pointer to it suﬃces. Due to diﬀerent requirements for storing records, the structure of pointers can vary. The most obvious pointer type is the physical address of a record on disk or in a virtual storage and can easily be used to compute the page to be read. The main advantage
is a direct access to the searched record. But it is impossible to move a record within a page, because this requires the locating and changing of all pointers to this record. We call these pointers physical pointers. Due to this drawback, a pointer is often described as a pair (p, n) where p is the number of the page where the record can be found and where n is a number indicating the location of the record on the page. The parameter n can be interpreted diﬀerently, e.g., as a relative byte address on a page, as a number of a slot, or as an index of a directory in the page header . The entry at this index position yields the relative position of the record on the page. All pointers (s, p) remain unchanged and are named page-related pointers. Pointers that are completely stable against movements in main memory can be achieved if a record is associated with a logical address that reveals nothing about its storage. The record can be moved freely in a ﬁle without changing any pointers. This can be realized by indirect addressing. If a record is moved, only the respective entry in a translation table has to be changed. All pointers remain unchanged, and we call them logical pointers . The main drawback is that each access to a record needs an additional access to the translation table. Further, the table can become so large that it does not ﬁt completely in main memory.
A page can be considered as a collection of slots. Each slot can capture exactly one record. If all records have the same length, all slots have the same size and can be allocated consecutively on the page. Hence, a page contains so many records as slots ﬁt on a page plus page information like directories and pointers to other pages. A ﬁrst alternative for arranging a set of N ﬁxed-length records is to place them in the ﬁrst N slots (see Figure 60.15). If a record is deleted in slot i < N , the last record on the page in slot N is moved to the free slot i. However, this causes problems if the record to be moved is pinned‡ and the slot number has to be changed. Hence, this “packed” organization is problematic, although it allows one to easily compute the location of the ith record. A second alternative is to manage deletions of records on each page and thus information about free slots by means of a directory represented as a bitmap. The retrieval of the ith record as well as ﬁnding the next free slot on a page require a traversal of the directory. The search for the next free slot can be sped up if an additional, special ﬁeld stores a pointer on the ﬁrst slot whose deletion ﬂag is set. The slot itself then contains a pointer to the next free slot so that a chaining of free slots is achieved.
Also variable-length records can be positioned consecutively on a page. But deletions of records must be handled diﬀerently now because slots cannot be reused in a simple manner any more. If a new record is to be inserted, ﬁrst a free slot of “the right size” has to be found. If the slot is too large, storage is wasted. If it is too small, it cannot be used. In any case, unused storage areas (fragmentation) at the end of slots can only be avoided if records on a page are moved and condensed. This leads to a connected, free storage area. If the records of a page are unpinned, the “packed” representation for ﬁxed-length records can be adapted. Either a special terminator symbol marks the end of the record, or a ﬁeld at the beginning of the record keeps its length. In the general case, indirect addressing is needed which permits record movements without negative eﬀects and without further access costs. The most ﬂexible organization of variable-length records is provided by the tuple identiﬁer (TID ) concept (Figure 60.16).
Each record is assigned a unique, stable pointer consisting of a page number and an index into a page-internal directory. The entry at index i contains the relative position, i.e., the oﬀset, of slot i and hence a pointer to record i on the page.
The length information of a record is stored either in the directory entry or at the beginning of the slot (Li in Figure 60.16). Records which grow or shrink can be moved on the page without being forced to modify their TIDs. If a record is deleted, this is registered in the corresponding directory entry by means of a deletion ﬂag.
Since a page cannot be subdivided into predeﬁned slots, some kind of free disk space management is needed on each page. A pointer to the beginning of the free storage space on the page can be kept in the page header. If a record does not ﬁt into the currently
available free disk space, the page is compressed (i.e., defragmented) and all records are placed consecutively without gaps. The eﬀect is that the maximally available free space is obtained and is located after the record representations.
If, despite defragmentation, a record does still not ﬁt into the available free space, the record must be moved from its “home page” to an “overﬂow page”. The respective TID can be kept stable by storing a “proxy TID” instead of the record on the home page. This proxy TID points to the record having been moved to the overﬂow page. An overﬂow record is not allowed to be moved to another, second overﬂow page. If an overﬂow record has to leave its overﬂow page, its placement on the home page is attempted. If this fails due to a lack of space, a new overﬂow page is determined and the overﬂow pointer is placed on the home page. This procedure assures that each record can be retrieved with a maximum of two page accesses.
If a record is deleted, we can only replace the corresponding entry in the directory by a deletion ﬂag. But we cannot compress the directory since the indexes of the directory are used to identify records. If we deleted an entry and compress, the indexes of the subsequent slots in the directory would be decremented so that TIDs would point to wrong slots and thus wrong records. If a new record is inserted, the ﬁrst entry of the directory containing a deletion ﬂag is selected for determining the new TID and pointing to the new record.
If a record represents a large object, i.e., it does not ﬁt on a single page but requires a collection of linked pages, the diﬀerent data structures for blobs can be employed.
A ﬁle (segment ) can be viewed as a sequence of blocks (pages ). Four fundamental ﬁle organizations can be distinguished, namely ﬁles of unordered records (heap ﬁles), ﬁles of ordered records (sorted ﬁles), ﬁles with dispersed records (hash ﬁles), and tree-based ﬁles (index structures).
Heap ﬁles are the simplest ﬁle organization. Records are inserted and stored in their un- ordered, chronological sequence. For each heap ﬁle we have to manage their assigned pages (blocks) to support scans as well as the pages containing free space to perform insertions eﬃciently. Doubly-linked lists of pages or directories of pages using both page numbers for page addressing are possible alternatives. For the ﬁrst alternative, the DBMS uses a header page which is the ﬁrst page of a heap ﬁle, contains the address of the ﬁrst data page, and information about available free space on the pages. For the second alternative, the DBMS must keep the ﬁrst page of the heap ﬁle in mind. The directory itself represents a collection of pages and can be organized as a linked list. Each directory entry points to a page of the heap ﬁle. The free space on each page is recorded by a counter associated with each directory entry. If a record is to be inserted, its length can be compared to the number of free bytes on a page.
Sorted ﬁles physically order their records based on the values of one (or several) of their ﬁelds, called the ordering ﬁeld (s). If the ordering ﬁeld is also a key ﬁeld of the ﬁle, i.e., a ﬁeld guaranteed to have a unique value in each record, then the ﬁeld is called the ordering key for the ﬁle. If all records have the same ﬁxed length, binary search on the ordering key can be employed resulting in faster access to records.
Hash ﬁles are a ﬁle organization based on hashing and representing an important indexing technique. They provide very fast access to records on certain search conditions. Internal hashing techniques have been discussed in diﬀerent chapters of this book; here we are dealing with their external variants and will only explain their essential features. The fundamental idea of hash ﬁles is the distribution of the records of a ﬁle into so-called buckets, which are organized as heaps. The distribution is performed depending on the value of the search key.
The direct assignment of a record to a bucket is computed by a hash function. Each bucket consists of one or several pages of records. A bucket directory is used for the management of the buckets, which is an array of pointers. The entry for index i points to the ﬁrst page of bucket i. All pages for bucket i are organized as a linked list. If a record has to be inserted into a bucket, this is usually done on its last page since only there space can be found. Hence, a pointer to the last page of a bucket is used to accelerate the access to this page and to avoid traversing all the pages of the bucket. If there is no space left on the last page, overﬂow pages are provided. This is called a static hash ﬁle. Unfortunately, this strategy can cause long chains of overﬂow pages. Dynamic hash ﬁles deal with this problem by allowing a variable number of buckets. Extensible hash ﬁles employ a directory structure in order to support insertion and deletion eﬃciently without the employment of overﬂow pages. Linear hash ﬁles apply an intelligent strategy to create new buckets. Insertion and deletion are eﬃciently realized without using a directory structure.
Index structures are a fundamental and predominantly tree-based ﬁle organization based on the search key property of values and aiming at speeding up the access to records. They have a paramount importance in query processing. Many examples of index structures are already described in detail in this handbook, e.g., B-trees and variants, quad-trees and octtrees, R-trees and variants, and other multidimensional data structures. We will not discuss them further here. Instead, we mention some basic and general organization forms for index structures that can also be combined. An index structure is called a primary organization if it contains search key information together with an embedding of the respective records; it is named a secondary organization if it includes besides search key information only TIDs or TID lists to records in separate ﬁle structures (e.g., heap ﬁles or sorted ﬁles). An index is called a dense index if it contains (at least) one index entry for each search key value which is part of a record of the indexed ﬁle; it is named a sparse index (Figure 60.17) if it only contains an entry for each page of records of the indexed ﬁle. An index is called a clustered index (Figure 60.17) if the logical order of records is equal or almost equal to their physical order, i.e., records belonging logically together are physically stored on neighbored pages. Otherwise, the index is named non-clustered . An index is called a one-dimensional index if a linear order is deﬁned on the set of search key values used for organizing the index entries. Such an order cannot be imposed on a multi-dimensional index where the organization of index entries is based on spatial relationships. An index is called a single-level index if the index only consists of a single ﬁle; otherwise, if the index is composed of several ﬁles, it is named a multi-level index .