One might be tempted to assume that collisions do not occur very often if a small subset of the set of possible keys is chosen, but this assumption is mistaken.
Assume we have a hash table of size
, and that it currently has
entries. Then we call
the load factor of the
hash table. The load factor can be seen as describing how full the
table currently is: A hash table with load factor
is 25% full,
one with load factor
is 50% full, and so forth. If we have a
hash table with load factor
then the probability that for
the next key we wish to insert a collision occurs is
. Thus
assumes that each key from the key space is equally likely, and that
the hash function
spreads the key space evenly over the set of
indices of our array. If these optimistic assumptions fail, then the
probability can be higher.
Therefore to minimize collision, it is prudent to keep the load factor low, fifty percent being an often quoted figure. We will see later what effect the table's load factor has on the speed of the operations we are interested in.