Quicksort is an efficient in-place sorting algorithm that is often faster in practice in comparing to other sorting algorithms. The algorithm is based on the divide-and-conquer paradigm.
In quicksort, the steps performed are the following:
- pick an element, called a pivot, from the array
- reorder the array so that all values smaller than the pivot are moved before it and all values larger than the pivot are moved after it, with values equaling the pivot going either way
- recursively sort the subarrays of lesser elements and the subarray of greater elements
The quicksort can be implemented as a recursive or iterative algorithm. We will consider only the recursive version here.
The time complexity is O(n log n) in the average case and O(n^2) in the worst case, but fortunately, it often works like in average. We will consider some bad cases for the algorithm later.
Note, there are a large number of modifications which make the algorithm more efficient. The pivot selection and partitioning steps can be implemented in different ways. The choice of specific implementation strategy greatly affects the algorithm's performance.
Choosing a pivot
The strategy to choose a pivot strongly affects the sorting time of quicksort. It's difficult to determine a good pivot for all arrays.
The best case pivot would divide the array into two equal parts, which would halve the problem size. However, this means that the pivot is the median of the elements, and in order to find the median, we would need an already sorted array or use a more complex approach to find the median.
Here are some used methods to choose the pivot:
- always pick the leftmost or rightmost element as the pivot;
- pick the middle element as the pivot;
- pick a random element as the pivot;
- take the first, middle and last value of the array and choose the median of those three numbers as the pivot.
In this topic, we will always pick the rightmost element as the pivot. After, you will be able to learn the more complex strategies for choosing the pivot.
How it works by an example
Suppose we'd like to sort an array of ints using quicksort algorithm. As the pivot, it always will pick the rightmost element.
The input array contains 8 ints which have indexes from 0 to 7. The following image illustrates how quicksort works, see some explanations below.
1) Let's pick the rightmost element 14 as the pivot in the original array and then reorder the array so that all values smaller than the pivot 11, 10, 13
are moved before it and all values larger than the pivot 17, 25, 16, 22
are moved after it. After the rearranging, the pivot has the index 3.
2) Divide the array into two subarrays: the left one { 11, 10, 13 }
and the right one { 17, 25, 16, 22 }
. The pivot is excluded from both subarrays.
3) Consider the left subarray. The pivot is 13. After rearranging we obtain the same order of elements in the array { 11, 10, 13 }
.
4) Divide the subarray { 11, 10, 13 }
into two smaller subarrays: the left one { 11, 10 }
and the empty subarray.
5) Consider the subarray { 11, 10 }
. The pivot is 10. After rearranging, we obtain the order { 10, 11 }
.
6) Divide the subarray { 10, 11 }
into two smaller subarrays: the empty array and { 11 }
. Both new arrays are already sorted. The pivot (10) is excluded.
Note, we consider empty and single-element arrays as already sorted and do not process them.
7) We perform the same actions for the right subarray { 17, 25, 16, 22 }
of the original array until all subarrays are empty or consist of a single element.
After it, the whole array is sorted: { 10, 11, 13, 14, 16, 17, 22, 25 }
.
Note, when we have chosen a pivot and rearranged an array, the pivot is in its final position. All left elements are less than the pivot and all right elements are greater than the pivot. But they may not be ordered relative to each other at the current step.
If you would like to see a visualization, click Quick Sort here.
Problems and possible modifications
1) Unfortunately, the Lomuto partition scheme causes worst-case behaviour O(n^2) on already sorted arrays. The problem can be solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by Sedgewick).
The "median-of-three" rule counters the case of sorted (or reverse-sorted) input and gives a better estimate of the optimal pivot (the true median) than selecting any single element when no information about the ordering of the input is known.
2) When all the input elements are equal, one subarray is always empty while another has only decreased by one element (the pivot is removed). It causes the worst-case behavior.
To avoid the problem, we can separate the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot. The values equal to the pivot are already sorted, so only the less-than and greater-than partitions need to be recursively sorted.
So, it's possible to improve the worst case of quicksort to make it faster on many data sets.