SELECT with groups of 3
Exercise 9.3-1 asks you to show that the $SELECT$ algorithm still runs in linear time if the elements are divided into groups of 7. This problem asks about dividing into groups of 3.
- Show that $SELECT$ runs in linear time if you divide the elements into groups whose size is any odd constant greater than 3.
- Show that $SELECT$ runs in $O(n\; lg\; n)$ time if you divide the elements into groups of size 3
Because the bound in part (b) is just an upper bound, we do not know whether the groups-of-3 strategy actually runs in $O(n)$ time. But by repeating the groups-of-3 idea on the middle groups of medians, we can pick a pivot that guarantees $O(n)$ time. The $SELECT3$ algorithm on the next page determines the $i^{\;th}$ smallest of an input array of $n \gt 1$ distinct elements.
- Describe in English how the $SELECT3$ algorithm works.Include in your description one or more suitable diagrams.
- Show that $SELECT3$ runs in $O(n)$ time in the worst case. [1]
The worst-case linear-time median-finding algorithm was devised by Blum, Floyd, Pratt, Rivest and Tarjan [2]. This exercise improves the original algorithm [2] to use groups of 3 without deteriorating its performance. The constant factors of this new algorithm $SELECT3$ outperforms that of its counterpart $SELECT$. Therefore, it pays to understand the relative merits of $SELECT3$. What follows is the $SELECT3$ algorithm.
What follows are our answers to the problem.
- Let $k$ be the group size and $n$ be the number of elements.
Number of groups $g = \frac{n}{k}$ for some odd constant $k$
Number of elements $\lt$ pivot (or $\gt$ pivot) $\ge \lceil \frac{g}{2}\rceil \times \lceil \frac{k}{2}\rceil = \lceil\frac{\frac{n}{k}}{2}\rceil \times \lceil \frac{k}{2}\rceil$
Number of elements in low or high side of the partition around x $\ge \frac{n}{2k} \times \frac{k}{2} = \frac{n}{4}$
Number of elements in play after partition around x $\le n - \frac{n}{4} = \frac{3n}{4}$
$T(n) \le T(\frac{n}{k}) + T(\frac{3n}{4}) + \Theta(n)$.
This runs in $\Theta(n)$ time if and only if: $\frac{n}{k} + \frac{3n}{4} \lt n$.
And this holds for any odd constant $k \gt 4$, thus, proving our claim.
- For groups of size $3$,
number of 3 element groups $ = g = \frac{n}{3}$, where n denotes the total number of elements in the input array.
Number of group medians in the blue region $ = \lceil \frac{g}{2} \rceil$
For each group median, one additional element is less than it.
So, the number of elements in the blue region $ \ge \lceil \frac{g}{2} \rceil \times 2 \ge \frac{g}{2} \times 2 = g$
Thus, total number of elements in play after partition around $x \le n - g = n - \frac{n}{3} = \frac{2n}{3}$
What follows is our recurrence for groups of size 3 elements: $T(n) \le T(\frac{n}{3})\; +\; T(\frac{2n}{3}) + \Theta(n)$
We will show that, $T(n) = O(n\; log\;n)$ by substitution.
We'll prove that $T(n) \le c \cdot n\; \cdot log\;n$ for suitably large constant $c \gt 0$ and all $n \gt 0$
$T(n) \le c \cdot \frac{n}{3} \cdot log(\frac{n}{3}) + c \cdot \frac{2n}{3} \cdot log(\frac{2n}{3}) + \Theta(n)$
$T(n) \lt c \cdot \frac{n}{3} \cdot log(n) + c \cdot \frac{2n}{3} \cdot log(n) + \Theta(n)$
$T(n) \lt c \cdot n \cdot log(n) + \Theta(n)$
$T(n) = O(n \; log\; n).$
Thus, running time of $SELECT$ for groups of 3 elements is: $O(n \cdot log\; n)$
- In the while loop, from lines 1 to 10, in each iteration we keep the smallest element in A[p], and then find the (i - 1)st smallest element in the subarray A[p + 1: r]. We do it < 9 iterations. The while loop terminates when we have the remaining number of elements which is divisible by 9. Then, we divide these remaining elements into groups of 3 elements each, and sort them in place. Now, all group medians lie in the middle third of A[p:r]. Then, we take the group medians (middle third), and further divide them into 3 element subgroups, sorting them in place as before. All subgroup medians now lie in the middle 9th of A[p:r]. Then we find the pivot x as the median of subgroup medians recursively. The rest is trivial, thus it is not described here. Please see the figure for more details.
- Number of elements $\lt x$ (or $\gt x$) $ = \lceil \frac{g}{2} \rceil \times 2 \ge g = \frac{n}{3}$
Number of elements in the blue region $ \ge \frac{n}{3}$
Thus, total number of elements in play after partition around $x \le n - \frac{n}{3} = \frac{2n}{3}$
The recurrence is therefore given by $T(n) \le T(\frac{n}{9})\; +\; T(\frac{2n}{3})\; +\; \Theta(n)$
Our inductive hypothesis is: $T(n) \le cn$ for some suitably large constant $c \gt 0$ and for all $n \gt 0$
Substituting this in to the right hand side of the recurrence, we then obtain: $T(n) \le \frac{cn}{9} \; +\; \frac{2cn}{3}\; + \Theta(n)$
$T(n) \le \frac{7cn}{9} \;+\; \Theta(n)$
$T(n) \le cn - \frac{2cn}{9} \;+\; \Theta(n)$
$T(n) \le cn$
and this holds if c is chosen large enough that $\frac{2c}{9}$ constant dominates the upper bound constant hidden by the term $\Theta(n)$, thus proving our claim.
Thus $SELECT3$ runs in $O(n)$ time in the worst case.
Comments
Post a Comment