hash算法-wqfhenanxc_cu-ChinaUnix博客

Although searching for an element in a hash table can take as long as searching for an element in a linked list-Θ(n) time in the worst case-in practice, hashing performs extremely well. Under reasonable assumptions, the expected time to search for an element in a hash table is O(1).

Direct addressing is applicable when we can afford to allocate an array that has one position for every possible key.Direct addressing is a simple technique that works well when the universe U of keys is reasonably small.

When the number of keys actually stored is small relative to the total number of possible keys, hash tables become an effective alternative to directly addressing an array, since a hash table typically uses an array of size proportional to the number of keys actually stored.

The point of the hash function is to reduce the range of array indices that need to be handled.

We might choose a suitable hash function h to avoid collisions. While a well-designed, "random"-looking hash function can minimize the number of collisions, we still need a method for resolving the collisions that do occur.

Collisions can be resolved by chaining. How well does hashing with chaining perform?

Given a hash table T with m slots that stores n elements, we define the load factor α for T as n/m, that is, the average number of elements stored in a chain. Our analysis will be in terms of α, which can be less than, equal to, or greater than 1.

The worst-case behavior of hashing with chaining is terrible: all n keys hash to the same slot, creating a list of length n. The worst-case time for searching is thus Θ(n) plus the time to compute the hash function-no better than if we used one linked list for all the elements. Clearly, hash tables are not used for their worst-case performance.

The average performance of hashing depends on how well the hash function h distributes the set of keys to be stored among the m slots, on the average.

simple uniform hashing: any given element is equally likely to hash into any of the m slots, independently of where any other element has hashed to.

假设有0,1,2,...m-1共m个slot，the length of T[j] is nj，故有n=n0+n1+n2+....+n[m-1]. nj的期望E[nj]=α=n/m.

In a hash table in which collisions are resolved by chaining, an unsuccessful search takes expected time Θ(1 + α), a successful search takes expected time Θ(1 + α), under the assumption of simple uniform hashing.

What does this analysis mean? If the number of hash-table slots is at least proportional to the number of elements in the table, we have n = O(m) and, consequently, α = n/m = O(m)/m = O(1). Thus, searching takes constant time on average. Since insertion takes O(1) worst-case time and deletion takes O(1) worst-case time when the lists are doubly linked, all dictionary operations can be supported in O(1) time on average.

A good hash function satisfies the assumption of simple uniform hashing。