SELECT with groups of 3

Exercise 9.3-1 asks you to show that the $SELECT$ algorithm still runs in linear time if the elements are divided into groups of 7. This problem asks about dividing into groups of 3.

  • Show that $SELECT$ runs in linear time if you divide the elements into groups whose size is any odd constant greater than 3.
  • Show that $SELECT$ runs in $O(n\; lg\; n)$ time if you divide the elements into groups of size 3

Because the bound in part (b) is just an upper bound, we do not know whether the groups-of-3 strategy actually runs in $O(n)$ time. But by repeating the groups-of-3 idea on the middle groups of medians, we can pick a pivot that guarantees $O(n)$ time. The $SELECT3$ algorithm on the next page determines the $i^{\;th}$ smallest of an input array of $n \gt 1$ distinct elements.

  • Describe in English how the $SELECT3$ algorithm works.Include in your description one or more suitable diagrams.
  • Show that $SELECT3$ runs in $O(n)$ time in the worst case. [1]

 

The worst-case linear-time median-finding algorithm was devised by Blum, Floyd, Pratt, Rivest and Tarjan [2]. This exercise improves the original algorithm [2] to use groups of 3 without deteriorating its performance. The constant factors of this new algorithm $SELECT3$ outperforms that of its counterpart $SELECT$. Therefore, it pays to understand the relative merits of $SELECT3$. What follows is the $SELECT3$ algorithm.

$SELECT3(A, p, r, i)$  
    $\phantom{{}++{}}$ `while` $(r - p + 1)\; mod\; 9 \neq 0$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$  `for` $j = p + 1$ to $r$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ $\phantom{{}++{}}$ `if` $A[p] \gt A[j]$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ $\phantom{{}++{}}$ $\phantom{{}++{}}$ exchange $A[p]$ with $A[j]$  

    $\phantom{{}++{}}$ `if` $i == 1$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ `return` $A[p]$  
    $\phantom{{}++{}}$ $p = p + 1$  
    $\phantom{{}++{}}$ $i = i - 1$  
    $\phantom{{}++{}}$ $g = \frac{(r - p + 1)}{3}$  
    $\phantom{{}++{}}$  `for` $j = p$ to $p + g - 1$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ sort $\langle A[j],\; A[j + g],\; A[j + 2g]\rangle$  in place  
    $\phantom{{}++{}}$ $g^\prime = \frac{g}{3}$  
    $\phantom{{}++{}}$ `for` $j = p + g$ to $p + g + g^\prime - 1$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ sort $\langle A[j],\; A[j + g^\prime],\; A[j + 2g^\prime]\rangle$  in place  
    $\phantom{{}++{}}$ $x = SELECT3(A, p + 4g^\prime, p + 5g^\prime - 1, \lceil \frac{g^\prime}{2} \rceil)$  
    $\phantom{{}++{}}$ $q = PARTITION-AROUND(A, p, r, x)$  
    $\phantom{{}++{}}$ $k = q - p + 1$    
    $\phantom{{}++{}}$ `if` $i == k$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$ `return` $A[q]$  
    $\phantom{{}++{}}$ `elseif` $i < k$  
    $\phantom{{}++{}}$ $\phantom{{}++{}}$  `return` $SELECT3(A, p, q - 1, i)$  
    $\phantom{{}++{}}$ `else` `return` $SELECT3(A, q + 1, r, i - k)$   

What follows are our answers to the problem.

  • Let $k$ be the group size and $n$ be the number of elements. 
    Number of groups $g = \frac{n}{k}$ for some odd constant $k$
    Number of elements $\lt$ pivot (or $\gt$ pivot) $\ge \lceil \frac{g}{2}\rceil \times \lceil \frac{k}{2}\rceil = \lceil\frac{\frac{n}{k}}{2}\rceil \times \lceil \frac{k}{2}\rceil$
    Number of elements in low or high side of the partition around x $\ge \frac{n}{2k} \times \frac{k}{2} = \frac{n}{4}$
    Number of elements in play after partition around x $\le n - \frac{n}{4} = \frac{3n}{4}$
    $T(n) \le T(\frac{n}{k}) + T(\frac{3n}{4}) + \Theta(n)$.
    This runs in $\Theta(n)$ time if and only if: $\frac{n}{k} + \frac{3n}{4} \lt n$.
    And this holds for any odd constant $k \gt 4$, thus, proving our claim.

  • For groups of size $3$,
    number of 3 element groups $ = g = \frac{n}{3}$, where n denotes the total number of elements in the input array.
    Number of group medians in the blue region $ = \lceil \frac{g}{2} \rceil$
    For each group median, one additional element is less than it.
    So, the number of elements in the blue region $ \ge \lceil \frac{g}{2} \rceil \times 2 \ge \frac{g}{2} \times 2 = g$
    Thus, total number of elements in play after partition around $x \le n - g = n - \frac{n}{3} = \frac{2n}{3}$   
    What follows is our recurrence for groups of size 3 elements:  $T(n) \le T(\frac{n}{3})\; +\; T(\frac{2n}{3}) + \Theta(n)$
    We will show that, $T(n) = O(n\; log\;n)$ by substitution.
    We'll prove that  $T(n) \le c \cdot n\; \cdot log\;n$ for suitably large constant $c \gt 0$ and all $n \gt 0$
    $T(n) \le c \cdot \frac{n}{3} \cdot log(\frac{n}{3}) + c \cdot \frac{2n}{3} \cdot log(\frac{2n}{3}) + \Theta(n)$
    $T(n) \lt c \cdot \frac{n}{3} \cdot log(n) + c \cdot \frac{2n}{3} \cdot log(n) + \Theta(n)$   
    $T(n) \lt c \cdot n \cdot log(n) + \Theta(n)$
    $T(n) = O(n \; log\; n).$
    Thus, running time of $SELECT$ for groups of 3 elements is: $O(n \cdot log\; n)$

  • In the while loop, from lines 1 to 10, in each iteration we keep the smallest element in A[p], and then find the (i - 1)st smallest element in the subarray A[p + 1: r]. We do it < 9 iterations. The while loop terminates when we have the remaining number of elements which is divisible by 9. Then, we divide these remaining elements into groups of 3 elements each, and sort them in place. Now, all group medians lie in the middle third of A[p:r]. Then, we take the group medians (middle third), and further divide them into 3 element subgroups, sorting them in place as before. All subgroup medians now lie in the middle 9th of A[p:r]. Then we find the pivot x as the median of subgroup medians recursively. The rest is trivial, thus it is not described here. Please see the figure for more details.


  • Number of elements $\lt x$ (or $\gt x$) $ = \lceil \frac{g}{2} \rceil \times 2 \ge g = \frac{n}{3}$
    Number of elements in the blue region $ \ge \frac{n}{3}$
    Thus, total number of elements in play after partition around $x \le n - \frac{n}{3} = \frac{2n}{3}$ 
    The recurrence is therefore given by $T(n) \le T(\frac{n}{9})\; +\; T(\frac{2n}{3})\; +\; \Theta(n)$ 
    Our inductive hypothesis is: $T(n) \le cn$ for some suitably large constant $c \gt 0$ and for all $n \gt 0$ 
    Substituting this in to the right hand side of the recurrence, we then obtain: $T(n) \le \frac{cn}{9} \; +\; \frac{2cn}{3}\; + \Theta(n)$ 
    $T(n) \le \frac{7cn}{9} \;+\; \Theta(n)$ 
    $T(n) \le cn - \frac{2cn}{9} \;+\; \Theta(n)$ 
    $T(n) \le cn$ 
    and this holds if c is chosen large enough that $\frac{2c}{9}$ constant dominates the upper bound constant hidden by the term $\Theta(n)$, thus proving our claim. 
    Thus $SELECT3$ runs in $O(n)$ time in the worst case. 

Comments

Popular posts from this blog

Introducing Java Reactive Extentions in to a SpringBoot Micro Service

Optimal binary search trees

Edit distance