In order to solve this problem, we need a strategy for exploring
graphs which guarantees that we don't miss an edge or vertex.
Because, unlike trees, graphs don't have a root vertex, there is no
natural place to start a traversal, and therefore we assume that we
are given a starting vertex
. There are two strategies for
performing this task.
The first is known as breadth first traversal: We start with
the given vertex
. Then we visit its neighbours one by one (which
must be possible no matter which implementation we use), placing them
in an initially empty queue. We then remove the first vertex from the
queue and one by one put its neighbours at the end of the queue. We
then visit the next vertex in the queue and again put its neighbours
at the end of the queue. We do this until the queue is empty.
However, there is no reason why this algorithm should ever terminate.
If there is a circle in the graph, like A, B, C in in one of the above
graphs, we would enqueue a vertex we have already visited, and
thus we would run into an infinite loop (visiting A's neighbours puts
B onto the queue, visiting that (eventually) gives us C, and once we
reach C in the queue, we get A again). To avoid this we create a
second array done of booleans, where done[j] is true
if we have already visited the vertex with number
, and it is false otherwise. In the above algorithm, we only enqueue a vertex
if done[j] is false. Then we mark it as done by
setting done[j] = true. This way, we won't enqueue any vertex
more than once, and for a finite graph, our algorithm is bound to
terminate. In the example we are discussing, breadth first
search starting at A might yield: A, B, D, C, E.
To see why this is called breadth first search, imagine a binary tree being implemented in this way, where the starting vertex is the root. We would then first follow all the edges emanating from the root, leading to all the vertices on level 1, then find all the vertices on the level below, and so on, until we find all the vertices on the `lowest' level.
The second is known as depth-first traversal: Given a vertex
to start from we now put it on a stack rather than a queue (recall
that in a stack, the only item that can be removed at any time is the
last one to be put on the stack), and mark it as done as for breadth
first traversal. We then look up its neighbours one after the other,
mark them as done and put them onto the stack. We then pop the next
vertex from the stack and visit its neighbours in turn--provided they
have not been marked as done, just as as we did for breadth first
traversal. For the example discussed above, we might get (again
starting from A): A, B, C, E, D.
Note that in both cases, the order of vertices depends on the implementation. There's no reason why A's neighbour B should be visited before D in the example. So it is better to speak of a result of depth-first or breadth-first traversal. Also note that the only vertices that will be listed are those in the same connected component as A. If we have to ensure that all vertices are visited we may need to start the process over with a different starting vertex: choose one that has not been marked as done when the algorithm terminates.