next up previous contents
Next: An example Up: Hash tables Previous: Hash tables   Contents

Likehood of collisions -- the load factor of a hash table

One might be tempted to assume that collisions do not occur very often if a small subset of the set of possible keys is chosen, but this assumption is mistaken.

Assume we have a hash table of size $ m$, and that it currently has $ n$ entries. Then we call $ \lambda = n/m$ the load factor of the hash table. The load factor can be seen as describing how full the table currently is: A hash table with load factor $ 0.25$ is 25% full, one with load factor $ 0.50$ is 50% full, and so forth. If we have a hash table with load factor $ \lambda$ then the probability that for the next key we wish to insert a collision occurs is $ \lambda$. Thus assumes that each key from the key space is equally likely, and that the hash function $ h$ spreads the key space evenly over the set of indices of our array. If these optimistic assumptions fail, then the probability can be higher.

Therefore to minimize collision, it is prudent to keep the load factor low, fifty percent being an often quoted figure. We will see later what effect the table's load factor has on the speed of the operations we are interested in.


next up previous contents
Next: An example Up: Hash tables Previous: Hash tables   Contents
Martin Escardo 2005-01-11