SELECT with groups of 3

Exercise 9.3-1 asks you to show that the SELECT algorithm still runs in linear time if the elements are divided into groups of 7. This problem asks about dividing into groups of 3.

  • Show that SELECT runs in linear time if you divide the elements into groups whose size is any odd constant greater than 3.
  • Show that SELECT runs in O(nlgn) time if you divide the elements into groups of size 3

Because the bound in part (b) is just an upper bound, we do not know whether the groups-of-3 strategy actually runs in O(n) time. But by repeating the groups-of-3 idea on the middle groups of medians, we can pick a pivot that guarantees O(n) time. The SELECT3 algorithm on the next page determines the ith smallest of an input array of n>1 distinct elements.

  • Describe in English how the SELECT3 algorithm works.Include in your description one or more suitable diagrams.
  • Show that SELECT3 runs in O(n) time in the worst case. [1]

 

The worst-case linear-time median-finding algorithm was devised by Blum, Floyd, Pratt, Rivest and Tarjan [2]. This exercise improves the original algorithm [2] to use groups of 3 without deteriorating its performance. The constant factors of this new algorithm SELECT3 outperforms that of its counterpart SELECT. Therefore, it pays to understand the relative merits of SELECT3. What follows is the SELECT3 algorithm.

SELECT3(A,p,r,i)  
    ++ `while` (rp+1)mod90  
    ++ ++  `for` j=p+1 to r  
    ++ ++ ++ `if` A[p]>A[j]  
    ++ ++ ++ ++ exchange A[p] with A[j]  

    ++ `if` i==1  
    ++ ++ `return` A[p]  
    ++ p=p+1  
    ++ i=i1  
    ++ g=(rp+1)3  
    ++  `for` j=p to p+g1  
    ++ ++ sort A[j],A[j+g],A[j+2g]  in place  
    ++ g=g3  
    ++ `for` j=p+g to p+g+g1  
    ++ ++ sort A[j],A[j+g],A[j+2g]  in place  
    ++ x=SELECT3(A,p+4g,p+5g1,g2)  
    ++ q=PARTITIONAROUND(A,p,r,x)  
    ++ k=qp+1    
    ++ `if` i==k  
    ++ ++ `return` A[q]  
    ++ `elseif` i<k  
    ++ ++  `return` SELECT3(A,p,q1,i)  
    ++ `else` `return` SELECT3(A,q+1,r,ik)   

What follows are our answers to the problem.

  • Let k be the group size and n be the number of elements. 
    Number of groups g=nk for some odd constant k
    Number of elements < pivot (or > pivot) g2×k2=nk2×k2
    Number of elements in low or high side of the partition around x n2k×k2=n4
    Number of elements in play after partition around x nn4=3n4
    T(n)T(nk)+T(3n4)+Θ(n).
    This runs in Θ(n) time if and only if: nk+3n4<n.
    And this holds for any odd constant k>4, thus, proving our claim.

  • For groups of size 3,
    number of 3 element groups =g=n3, where n denotes the total number of elements in the input array.
    Number of group medians in the blue region =g2
    For each group median, one additional element is less than it.
    So, the number of elements in the blue region g2×2g2×2=g
    Thus, total number of elements in play after partition around xng=nn3=2n3   
    What follows is our recurrence for groups of size 3 elements:  T(n)T(n3)+T(2n3)+Θ(n)
    We will show that, T(n)=O(nlogn) by substitution.
    We'll prove that  T(n)cnlogn for suitably large constant c>0 and all n>0
    T(n)cn3log(n3)+c2n3log(2n3)+Θ(n)
    T(n)<cn3log(n)+c2n3log(n)+Θ(n)   
    T(n)<cnlog(n)+Θ(n)
    T(n)=O(nlogn).
    Thus, running time of SELECT for groups of 3 elements is: O(nlogn)

  • In the while loop, from lines 1 to 10, in each iteration we keep the smallest element in A[p], and then find the (i - 1)st smallest element in the subarray A[p + 1: r]. We do it < 9 iterations. The while loop terminates when we have the remaining number of elements which is divisible by 9. Then, we divide these remaining elements into groups of 3 elements each, and sort them in place. Now, all group medians lie in the middle third of A[p:r]. Then, we take the group medians (middle third), and further divide them into 3 element subgroups, sorting them in place as before. All subgroup medians now lie in the middle 9th of A[p:r]. Then we find the pivot x as the median of subgroup medians recursively. The rest is trivial, thus it is not described here. Please see the figure for more details.


  • Number of elements <x (or >x) =g2×2g=n3
    Number of elements in the blue region n3
    Thus, total number of elements in play after partition around xnn3=2n3 
    The recurrence is therefore given by T(n)T(n9)+T(2n3)+Θ(n) 
    Our inductive hypothesis is: T(n)cn for some suitably large constant c>0 and for all n>0 
    Substituting this in to the right hand side of the recurrence, we then obtain: T(n)cn9+2cn3+Θ(n) 
    T(n)7cn9+Θ(n) 
    T(n)cn2cn9+Θ(n) 
    T(n)cn 
    and this holds if c is chosen large enough that 2c9 constant dominates the upper bound constant hidden by the term Θ(n), thus proving our claim. 
    Thus SELECT3 runs in O(n) time in the worst case. 

Comments

Popular posts from this blog

Introducing Java Reactive Extentions in to a SpringBoot Micro Service

Combining the emissions of multiple Observables together using RxJava Zip operator in a Spring Boot Micro service

RabbitMQ Transport in WSO2 ESB