next up previous contents
Next: Divide and conquer algorithms Up: Sorting Previous: Discussion.   Contents

A different idea--Heapsort

Let us consider another way of of implementing a selection sorting algorithm. The underlying idea is that it would help if we could pre-arrange the data such that selecting the smallest/biggest entry becomes easier. For that, remember the idea of a priority queue discussed earlier. We can take the item of an item to give it a priority. Then if we remove the item with the highest priority at each step we can fill an array `from the rear', starting with the biggest item.

Now priority queues can be implemented in different ways and we discussed an implementation using heap-trees. Another way of implementing them would be using a sorted array, so that the entry with the highest priority appears in data[size]. Removing this item would be very simple, but inserting a new one would always involve shifting a number of items to the right to make room for it:

n 0 1 2 3 4 5
data[n] 1 2 4      

n 0 1 2 3 4 5
data[n] 1 2   4    

n 0 1 2 3 4 5
data[n] 1 2 3 4    

A third way would be to use an unsorted array: A new item would be inserted by just putting it into data[size+1], but to delete the entry with the highest priority one would have to find it first. After that, the items with a higher index would have to be `shifted down'.

Of those three representations, only one is of use in carrying out the above idea: An unsorted array is what we started from, so that isn't any help, and ordering the array is what we are trying to achieve.

To make use of heap-trees, we first of all have to think of a way of taking an unsorted array and re-arranging it in such a way that it becomes a heap-tree. One possibility would be to insert the items one by one, using the insert algorithm discussed earlier. It turns out, however, that this can be done more efficiently. First of all note that if we have $ n$ items in the array data in positions 1, ..., n, then all the items with an index greater than $ n/2$ will be leaves. Therefore if we `trickle down' all the items data[n/2], ..., data[1] by exchanging them with the larger of their children until they either are positioned at a leaf, or until their children are both smaller, we obtain a heap-tree.

5 8 3 9 1 4 7 6 2
We know that the last 5 entries (those are the indices greater than $ 9/2=4$) are leaves of the tree (see the picture).
\begin{figure}\begin{center}
\epsfig{file=figs/fig4-2.eps} \end{center}\end{figure}

So the algorithm starts by trickling down 9, which turns out not to be nececssary, so the array remains the same. Next 3 is trickled down, giving:
5 8 7 9 1 4 3 6 2

Next 8 is trickled down, giving:
5 9 7 8 1 4 3 6 2

Finally, 5 is trickled down to give first
9 5 7 8 1 4 3 6 2

then
9 8 7 5 1 4 3 6 2

and finally
9 8 7 6 1 4 3 5 2




The time complexity of this algorithm is as follows: it trickles down $ \lfloor n/2\rfloor$ items, those with indices 1, ..., $ \lfloor n/2\rfloor$. Each of those trickle operations involve two comparisons at each stage. Now an item with index $ i$ will will be on level $ \log i$, which means that there are $ \log n - \log i =
\log(n/i)$ steps until a leaf is reached, so that the trickle process for the item at position $ i$ may stop. Hence the total number of comparisons carried out to trickle data[i] into position is at most $ 2\log(n/i)$. So the number of comparisons involved at most is

$\displaystyle 2\Big(\log(n/(n/2)) + \log(n/((n/2)-1)) + \cdots + \log(n/1)\Big) =
2(n/2)\log n - 2\log((n/2)!).
$

This can be shown to be smaller than $ 2.5n$.

Now that we have a heap-tree, we want to get a sorted array out of it. In the heap-tree, the item with the highest priority, that is the item with the largest item, in data[1]. In a sorted array, it should be in position data[size]. We then swap the two--which is almost the same as removing the root of the heap-tree, since data[size] is precisely the item that would be moved into the root position at the next step. Since now data[size] contains the correct item, we will never have to look at it again. Instead, we take the items data[1], ..., data[size-1] and rearrange them into a heap-tree with the trickle procedure, which we know to have complexity $ O(\log n)$.

Now the second largest item is in position data[1], and we know its final position will be data[size-1], so we now swap these two items. Then we rearrange data[1], ..., data[size-2] back into a heap-tree using the trickle procedure.

When the $ i$th step has been completed, the items data[n-i+1], ..., data[n] will have the correct entries, and there will be a heap-tree in the items data[1], ..., data[n-i]. (Note that the size, and therefore the height, of the heap-tree decreases.) As a part of the $ i$th step, we will have to trickle the new root down. This will take at most twice as many comparisons as the heap-tree is high at the time, which is the logarithm (to base 2) of the number of items in the heap-tree at the time, that is $ n-i$.

Hence the complexity function for this phase of the algorithm will be at most

$\displaystyle 2\Big(\log(n-1) + \log(n-2) + \cdots + \log2 + \log1\Big) =
2\log((n-1)!).
$

This function can be shown to be smaller than $ 2n\log n$.

So the worst-case complexity of the entire sorting algorithm, that is first rearranging the (unsorted) array into a heap-tree (which is proportional to $ n$) and secondly making a sorted array out of the heap-tree (which is proportional to $ n \log n$) is given by the sum of the two complexity functions. Since the term $ n \log n$ grows faster than $ n$, we can simplify $ n + n \log n$ to $ n \log n$.


next up previous contents
Next: Divide and conquer algorithms Up: Sorting Previous: Discussion.   Contents
Martin Escardo 2005-01-11