|
|
In this laboratory assignment we will consider the following:
A. Simple Sort
B. Bubble Sort
C. Quick Sort
D. Efficiency of Sorting Algorithms
To sort means to arrange in "order." Basically there are two linear orders--increasing (ascending) and decreasing (descending). If the list to be sorted contains duplicate copies of some element, then strictly increasing (decreasing) order is impossible. We may use non-decreasing (or non-increasing) to cover such cases. In this lab, we assume we sort a list in non-decreasing order.
Sorting is a frequent activity in computing and it is important to develop efficient algorithms. The first sort algorithm that most of us develop will probably be simple but not efficient! We call this algorithm the Simple Sort. Assume an array a[0..n-1] is to be sorted and swap(x,y) is a function that exchanges two elements stored in x and y.
1. for (i=0; i < n; i++) 2. for (j=0; j < n; j++) 3. if (a[i] > a[j]) swap(a[i], a[j])
The Simple Sort is not "good." We can do better!
We studied the Bubble Sort algorithm earlier. It is better in the best case and, in practice, in many cases. Shown below is pseudocode for the Bubble Sort.
1. k = n {Sort the first n elements} 2. sorted = false 3. While NOT sorted and we haven't exceeded the number of necessary passes 4. { 5. k = k - 1 {Reduce the size of subarray to sort} 6. sorted = true {See if this pass makes any changes} 7. for (i=0; i<k; i++) 8. if (a[i] > a[i+1]) 9. { 10. swap(a[i], a[i+1]) 11. sorted = false {Wasn't sorted ...} 12. } 13. }
It requires a bit more thinking to calculate the complexity (Big-O) of this algorithm. In the "best case" where the array a[0..n-1] is already sorted, the "while" loop will run only once. The "for" loop will iterate n-1 times, so the complexity is roughly n-1 = O(n) in the "best case." In the worst case, the while loop will run n-1 times and the "for" loop runs n-1, n-2, ..., 1 times respectively. The total number of comparisons (line 7) is
This is also the average time. So, theoretically, in the average and worst cases, the
bubble sort is no "better" than the simple sort. Though its magnitude (Big-O) is the
same, it has a better run time in general for small and reasonable size lists (as we
will see).
There are more efficient sort algorithms. We will discuss an algorithm called quick sort that is better in the average case than the previous two routines. The quick sort has an average case complexity of O(n log2 n). For small n, there is no great difference in O(n log2 n) and O(n2), but for larger values of n, say n > 1000, the difference is significant:
For example, let n=1000:
and
n2 = 1000 * 1000 = 1,000,000.
The basic idea behind the Quick Sort is as follows:
The image of an array below should help you to visualize this process.
< pivot | pivot | ≥ pivot |
first | pivotIndex | last |
Shown below is code to perform the Quick Sort. Note that it uses a recursive function.
//This function sorts the items in an array into ascending order void quickSort (DataType a[], int first, int last) //sort a[first..last] { int pivotIndex; //index of pivot position after partitioning if (first < last) { //create the partition by placing the first //array element exactly where it should stay partition(a, first, last, pivotIndex); //sort the two partitions quickSort(a, first, pivotIndex - 1) quickSort(a, pivotIndex + 1, last) } } //Rearrange elements in a[first..last] so that a[first] is in //its final sorted position at pivotIndex, all elements less than a[first] //are in positions less than pivotIndex, and all elements greater than //a[first] are in positions greater than pivotIndex. void partition(DataType a[], int first, int last, int& pivotIndex) { DataType pivot = a[first]; //the pivot is the first element int lastRegion1 = first; //index of last item in region 1 //move one item at a time until unknown region is empty for (int firstUnknown = first + 1; firstUnknown <= last; ++firstUnknown) { //move item from unknown to proper region if (a[firstUnknown] < pivot) { //item from unknown belongs in region 1 ++lastRegion1; swap (a[firstUnknown], a[lastRegion1]); } } //place the pivot in the proper position and mark its position swap (a[first], a[lastRegion1]); pivotIndex = lastRegion1; }
Here is an example of how the "partition" function works:
Like all divide-and-conquer routines, it is important that the two sublists are roughly the same size. For that to happen, the element separating them must belong near the middle of the sorted list. Of course, if we knew how to pick such an element we would not need a sort routine. We usually choose the left-most element of the list as the pivot. This is our approach in the above algorithm. Better techniques for selecting the pivot do exist. The "quick sort" is an O(n log2 n) algorithm in the average case, but it is O(n2) in the worst case. Interestingly, the worst case occurs when the array is already sorted. a[first] will already be in its final sorted position and one of the two sublists will be empty while the other will have one less element than the original.The array a contains:
0 1 2 3 4 5
40 50 20 60 30 70 When the partition() function is activated the first time, first = 0 and last = 5. Next pivot = a[0] = 40 (you can see this item in "yellow" above). Also lastRegion1 = 0 The for loop is started and: firstUnknown = 1 a[1] < pivot is false firstUnknown = 2 a[2] < pivot is true lastRegion1 = 1 swap a[2] with a[1] The table then becomes:
0 1 2 3 4 5
40 20 50 60 30 70 firstUnknown = 3 a[3] < pivot is false
0 1 2 3 4 5
40 20 50 60 30 70 firstUnknown = 4 a[4] < pivot is true lastRegion1 = 2 swap a[4] with a[2] The table then becomes:
0 1 2 3 4 5
40 20 30 60 50 70 firstUnknown = 5 a[5] < pivot is false the loop is exited swap a[0] with a[lastRegion1] or a[2] pivotIndex = lastRegion1 = 2 Partition is finished and the pivot is in the correct position. All of the items to the left (in green) are less than the pivot and all of the items to the right (in blue) are greater than the pivot.
0 1 2 3 4 5
30 20 40 60 50 70 After the return from "partition" pivotIndex contains 2 and two recursive calls will be made to the quick sort routine: one to sort a[0..1] and another to sort the list a[3..5].
To provide you some relative evaluation of these three sort algorithms, we have implemented each in the file inlabC.cc. C++ has a built-in function called clock() This function returns the amount of CPU time (in microseconds) used since the first call to clock(). To determine the time in seconds, the value returned must be divided by the value of CLOCKS_PER_SEC which is defined along with clock() in the ctime library.
n = 100 integers,
n = 1000 integers,
and n = 10,000 integers.
We have included a routine to generate a specified quantity of random integers. Only the quick sort is done recursively and recursion takes more CPU time so this puts quick sort at a slight disadvantage but it should win anyway! A non-recursive version of quick sort does exist and it would make a better comparison.
The "clock()" function requires the ctime header file to be included, and a structure of type clock_t to be declared.
Read the code to see how "clock()" is used.
n = 100 integers,
n = 1000 integers,
and n = 10,000 integers.
Draw a graph of the results from the above tests using the X-axis for the values of n and the Y-axis for the CPU time.