CSCI 2170 Lab C
Sorting

Objectives:: To study various sorting algorithms
To learn how to analyze the efficiency of different algorithms
To compare the efficiency of three different sorting algorithms

For those of you without Javascript capabilities these links will provide access to the additional files needed to complete this lab. They are set to open a new browser window if your browser supports this feature. Answer Sheet

In this laboratory assignment we will consider the following:

A. Simple Sort
B. Bubble Sort
C. Quick Sort
D. Efficiency of Sorting Algorithms

To sort means to arrange in "order." Basically there are two linear orders--increasing (ascending) and decreasing (descending). If the list to be sorted contains duplicate copies of some element, then strictly increasing (decreasing) order is impossible. We may use non-decreasing (or non-increasing) to cover such cases. In this lab, we assume we sort a list in non-decreasing order.

Sorting is a frequent activity in computing and it is important to develop efficient algorithms. The first sort algorithm that most of us develop will probably be simple but not efficient! We call this algorithm the Simple Sort. Assume an array a[0..n-1] is to be sorted and swap(x,y) is a function that exchanges two elements stored in x and y.

A. Simple Sort

Simple Sort:

1.  for (i=0; i < n; i++)
2.     for (j=0; j < n; j++)
3.       if (a[i] > a[j])  swap(a[i], a[j])

NOTE: For each of the following exercises, indicate answers on the answer sheet.

Exercise 1: As a review of Lab B, compute the run-time complexity of the above Simple Sort. Also, be sure to indicate the best case, worst case and average case complexity.

The Simple Sort is not "good." We can do better!

We studied the Bubble Sort algorithm earlier. It is better in the best case and, in practice, in many cases. Shown below is pseudocode for the Bubble Sort.

B. Bubble Sort

Bubble Sort:

1.  k = n          {Sort the first n elements}               
2.  sorted = false
3.  While NOT sorted and we haven't exceeded the number of necessary passes
4.  {
5.     k = k - 1   {Reduce the size of subarray to sort}
6.     sorted = true           {See if this pass makes any changes}
7.     for (i=0; i<k; i++)
8.        if (a[i] > a[i+1])
9.        {
10.          swap(a[i], a[i+1])
11.          sorted = false       {Wasn't sorted ...}
12.       }
13.  }

It requires a bit more thinking to calculate the complexity (Big-O) of this algorithm. In the "best case" where the array a[0..n-1] is already sorted, the "while" loop will run only once. The "for" loop will iterate n-1 times, so the complexity is roughly n-1 = O(n) in the "best case." In the worst case, the while loop will run n-1 times and the "for" loop runs n-1, n-2, ..., 1 times respectively. The total number of comparisons (line 7) is

1 + 2 + 3 + ... + n-1 = (n-1)*n/2 = O(n²).

This is also the average time. So, theoretically, in the average and worst cases, the bubble sort is no "better" than the simple sort. Though its magnitude (Big-O) is the same, it has a better run time in general for small and reasonable size lists (as we will see).

C. Quick Sort

There are more efficient sort algorithms. We will discuss an algorithm called quick sort that is better in the average case than the previous two routines. The quick sort has an average case complexity of O(n log₂ n). For small n, there is no great difference in O(n log₂ n) and O(n²), but for larger values of n, say n > 1000, the difference is significant:

For example, let n=1000:

n log₂n = 1000 log₂ 1000 = 1000 * 10 = 10,000

and

n² = 1000 * 1000 = 1,000,000.

The basic idea behind the Quick Sort is as follows:

pick one element in the array, which will be the pivot.

make a pass through the array, called a partition step, re-arranging the entries so that:

the pivot is in its proper place (in its final sorted position).
entries smaller than the pivot are to the left of the pivot.
entries larger than the pivot are to its right.

recursively apply quicksort to the part of the array that is to the left of the pivot, and to the part on its right.

The image of an array below should help you to visualize this process.

< pivot

pivot

≥ pivot

first

pivotIndex

last

Shown below is code to perform the Quick Sort. Note that it uses a recursive function.

Quick Sort:


     //This function sorts the items in an array into ascending order
     void quickSort (DataType a[], int first, int last) //sort a[first..last]
     {
       int pivotIndex;        //index of pivot position after partitioning

       if (first < last) 
       {
          //create the partition by placing the first
          //array element exactly where it should stay
          partition(a, first, last, pivotIndex);

          //sort the two partitions
          quickSort(a, first, pivotIndex - 1)
          quickSort(a, pivotIndex + 1, last)
        }
      }

      //Rearrange elements in a[first..last] so that a[first] is in
      //its final sorted position at pivotIndex, all elements less than a[first]
      //are in positions less than pivotIndex, and all elements greater than
      //a[first] are in positions greater than pivotIndex.
      void partition(DataType a[], int first, int last, int& pivotIndex)
      {
         DataType pivot = a[first];     //the pivot is the first element
         int lastRegion1 = first;       //index of last item in region 1
         
         //move one item at a time until unknown region is empty
         for (int firstUnknown = first + 1; firstUnknown <= last; ++firstUnknown)
         {
            //move item from unknown to proper region
            if (a[firstUnknown] < pivot)
            {
                //item from unknown belongs in region 1
                ++lastRegion1;
                swap (a[firstUnknown], a[lastRegion1]);
            }
         }

         //place the pivot in the proper position and mark its position
         swap (a[first], a[lastRegion1]);
         pivotIndex = lastRegion1;
                
     }

Here is an example of how the "partition" function works:

	The array a contains:

0 1 2 3 4 5

40 50 20 60 30 70

        When the partition() function is activated the first time,
            first = 0 and last = 5.
        Next pivot = a[0] = 40 (you can see this item in "yellow" above).
        Also lastRegion1 = 0

        The for loop is started and:
           firstUnknown = 1
           a[1] < pivot is false
               
           firstUnknown = 2
           a[2] < pivot is true
               lastRegion1 = 1
               swap a[2] with a[1]
           The table then becomes:

0 1 2 3 4 5

40 20 50 60 30 70

           firstUnknown = 3
           a[3] < pivot is false

0 1 2 3 4 5

40 20 50 60 30 70

           firstUnknown = 4
           a[4] < pivot is true
               lastRegion1 = 2
               swap a[4] with a[2]
           The table then becomes:

0 1 2 3 4 5

40 20 30 60 50 70

           firstUnknown = 5
           a[5] < pivot is false

           the loop is exited
           swap a[0] with a[lastRegion1] or a[2]
           pivotIndex = lastRegion1 = 2

        Partition is finished and the pivot is in the correct position.
        All of the items to the left (in green) are less than the pivot
        and all of the items to the right (in blue) are greater than the pivot.

0 1 2 3 4 5

30 20 40 60 50 70

       After the return from "partition" pivotIndex contains 2 and two recursive 
       calls will be made to the quick sort routine:  
            one to sort a[0..1] and 
            another to sort the list a[3..5].

Like all divide-and-conquer routines, it is important that the two sublists are roughly the same size. For that to happen, the element separating them must belong near the middle of the sorted list. Of course, if we knew how to pick such an element we would not need a sort routine. We usually choose the left-most element of the list as the pivot. This is our approach in the above algorithm. Better techniques for selecting the pivot do exist. The "quick sort" is an O(n log₂ n) algorithm in the average case, but it is O(n²) in the worst case. Interestingly, the worst case occurs when the array is already sorted. a[first] will already be in its final sorted position and one of the two sublists will be empty while the other will have one less element than the original.

D. Efficiency of Sorting Algorithms

To provide you some relative evaluation of these three sort algorithms, we have implemented each in the file inlabC.cc. C++ has a built-in function called clock() This function returns the amount of CPU time (in microseconds) used since the first call to clock(). To determine the time in seconds, the value returned must be divided by the value of CLOCKS_PER_SEC which is defined along with clock() in the ctime library.

Exercise 2: Test the CPU time used by each algorithm for a random list of

n = 100 integers,

n = 1000 integers,

and n = 10,000 integers.

We have included a routine to generate a specified quantity of random integers. Only the quick sort is done recursively and recursion takes more CPU time so this puts quick sort at a slight disadvantage but it should win anyway! A non-recursive version of quick sort does exist and it would make a better comparison.

Exercise 3: Draw a graph of the results from the above tests using an X-axis of n and a Y-axis of CPU_TIME. Put all three graphs on one coordinate system so we can compare them.

The "clock()" function requires the ctime header file to be included, and a structure of type clock_t to be declared.

Read the code to see how "clock()" is used.

Exercise 4: Now, let's analyze the results of the sorting techniques using data sets that are already in order. Add code to the inlabC.cc program so that the CPU time can be tested for a sorted list of