Sorting a list of data is a very common operation. Experimental Physicists often have to order lots of data. Astronomers may have to deal with plenty of data as well. But much less appreciated is the need in some numerical schemes to "order" data along the way. For instance, molecular dynamics may require to order the atoms in a simulation in terms of distances to build a distance classification tree.
The algorithms to perform ordering have been very well studied. They range widely in performance from the typical O( N^2 ) to an efficient O( N log N ). They also vary widely in terms of complexity of implementation. The best algorithm for a particular need may depend heavily on the size of the list to order, whether it is for a mere 1,000 "body" or 1,000,000 data points.
Numerical Recipes describes many of the most common algorithms.
Sorting lists of numbers in parallel using MPI definitely adds to the level of difficulties of sorting data. This will be the point of this section.
By far the simplest sorting algorithm is the Bubble Sort. It consists of scanning the list over and over again, exchanging pairwise the data points adjacent to each other according to an increasing or decreasing order. The sorting is completed when a pass through the list of data no longer encounters a need to exchange data points.
Watch out. Bubble Sorting is an algorithm of order O( N^2 ) according to Numerical Recipes. They suggest never to use Bubble Sorting in production codes.
Nevertheless, here is a simple implementation of the Bubble Sorting algorithm. The code BubbleSort.c generates a list of random numbers which are then ordered. The Bubble Sorting algorithm is certainly the easiest to implement and therefore the easiest to parallelize.
Numerical Recipes says that the Quicksort algorithm is the fastest sorting algorithm, being of order O( N log N ) to O( N^2 ), depending on the degree of partial ordering in the list from totally random to totally ordered.
The Quicksort algorithm :
QuickSort.c implements this algorithm.
A way to implement the Bubble Sort in parallel is to divide the domain of the list (more or less) equally between the N-1 nodes 1 to (N-1) of an N nodes parallel machine, keeping node 0 to administer the calculation. Each node 1 to (N-1) can then sort its partial list and send it back to node 0 for a final global merge.
The merge component of this parallel implementation is the hardest to code. The data set multi_lists, containing three short lists of data, can be used to check this writing.
The implementation of this code is left to the reader.
In its simplest form, the parallel implementation of the Quicksort Algorithm can be similar to that of the Bubble Sort.
A more efficient implementation could take advantage of the relative ordered ranges of the left-right sub-lists in the algorithm. The merging step would then be more efficient. But the number of nodes would have to be the master node plus a power-of-two worker nodes.
Note that there is another sorting algorithm that paralizes efficiently: the Bitonic Merge Sort algorithm. References can be found on the web.
Time can be also be saved by using an "inversed" binary tree to control the order in which the sub lists of ordered numbers are merged. The issue is to save some time by doing various sub-merges simultaneously.
A last "major item to implement is the collection of the data residing in the blocks.
Sub-lists lead to a paralization of the merging of data. The skeletton codes encountered in this section can be usd to guide the implementation of this algorithm. This requirres the following steps: