__Hash Tables__

__Hash Tables__

A typical extensible hash table is a resizable array of buckets, each holding an expected constant number of elements, and thus requiring on average a constant time for insert, delete and search operations [24]. The principal cost of *r**e**sizing*—the redistribution of items between old and new buckets—is amortized over all table operations, thus keeping the cost of operations constant on average. Here resizing means extending the table, as it has been shown that as a practical matter, hash tables need only increase in size [58]. See Chapter 9 for more on hash tables.

Michael [99] shows that a concurrent non-extensible hash table can be achieved by placing a read-write lock on every bucket in the table. However, to guarantee good performance as the number of elements grows, hash tables must be extensible [30].

In the eighties, Ellis [29] and others [58, 77] extended the work of Fagin et al. [30] by designing an extensible concurrent hash table for distributed databases based on two-level locking schemes. A recent extensible hash algorithm by Lea [88] is known to be highly eﬃcient in non-multiprogrammed environments [125]. It is based on a version of Litwin’s sequential linear hashing algorithm [91]. It uses a locking scheme that involves a small number of high-level locks rather than a lock per bucket, and allows concurrent searches while resizing the table, but not concurrent inserts or deletes. Resizing is performed as a global restructuring of all buckets when the table size needs to be doubled.

Lock-based extensible hash-table algorithms suﬀer from all of the typical drawbacks of blocking synchronization, as discussed earlier. These problems become more acute because of the elaborate “global” process of redistributing the elements in all the hash table’s buckets among newly added buckets. Lock-free extensible hash tables are thus a matter of both practical and theoretical interest.

As described in Section 47.5, Michael [99] builds on the work of Harris [42] to provide an eﬀective CAS-based lock-free linked list implementation. He then uses this as the basis for a lock-free hash structure that performs well in multiprogrammed environments: a ﬁxed-sized array of hash buckets, each implemented as a lock-free list. However, there is a diﬃculty in making a lock-free array of lists extensible since it is not obvious how to redistribute items in a lock-free manner when the bucket array grows. Moving an item between two bucket lists seemingly requires two CAS operations to be performed together atomically, which is not possible on current architectures.

Greenwald [39] shows how to implement an extensible hash table using his *two-handed** **em**ulation *technique. However, this technique employs a DCAS synchronization operation, which is not available on current architectures, and introduces excessive amounts of work during global resizing operations.

Shalev and Shavit [125] introduce a lock-free extensible hash table which works on current architectures. Their key idea is to keep the items in a single lock-free linked list instead of a list per bucket. To allow operations fast access to the appropriate part of the list, the ShalevShavit algorithm maintains a resizable array of “hints” (pointers into the list); operations use the hints to ﬁnd a point in the list that is close to the relevant position, and then follow list pointers to ﬁnd the position. To ensure a constant number of steps per operation on average, ﬁner grained hints must be added as the number of elements in the list grows. To allow these hints to be installed simply and eﬃciently, the list is maintained in a special *r**e**c**ursive split** **o**r**de**ring*. This technique allows new hints to be installed incrementally, thereby eliminating the need for complicated mechanisms for atomically moving items between buckets, or reordering the list.