kth quantiles of an n-element set

Problem statement

Exercise 9.3-8

The kth quantiles of an n-element set are the k - 1 order statistics that divide the sorted set into k equal sized sets (to within 1). Give an O(nlgk)-time algorithm to list the kth quantiles of a set. [1]


Definitions

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. [2]



Approach

Let's write a divide and conquer algorithm using selection to solve this problem. What if we find the l=nk2k th smallest number  and partition the array around it. Then, we can find the first k2 cut points in the first half l and the remaining k2 cut points in the second half nl. At each step, we keep on dividing k by two until it reaches k = 1 which is the trivial case of our recursion. We first solve the left subproblem recursively, and then print the current solution before solving the right subproblem recursively. What follows is our pseudocode for finding the kth quantiles of an n-element set. A complete treatment of the procedure RANDOMIZED-SELECT used in our algorithm and proof of it's running time is beyond the scope of this article, but the interested reader is directed to the CLRS book [1].


FINDQUANTILES(s,k)
++ let ns.length
++ FINDQUANTILESAUX(s,1,n,k)


FINDQUANTILESAUX(s,i,j,k)
++ // trivial case of recursion
++ `if` k==1
++ ++  `return`

++ let nji+1
++ let k1k2
++ let lnk1k
++ let pRANDOMIZEDSELECT(s,i,j,l)
++ FINDQUANTILESAUX(s,i,i+l1,k1)
++ `print` p
++ FINDQUANTILESAUX(s,i+l,j,kk1)


Sample Implementation

Here's a sample Java implementation of the algorithms discussed above along with a driver program. 

class FindQuantiles {
FindQuantiles() {
throw new AssertionError();
}
public static void main(String[] args) {
final int[] a = { 5, 13, 6, 14, 4, 9, 19, 11, 12, 15, 20, 1, 8, 18, 10, 16, 22, 7, 3, 2, 17, 23, 21 };
final int k = 7;
findQuantiles(a, k);
}
static void findQuantiles(int[] s, int k) {
final int n = s.length;
findQuantilesAux(s, 0, n - 1, k);
}
static void findQuantilesAux(int[] s, int i, int j, int k) {
if (k == 1)
return;
final int n = j - i + 1;
final int k1 = k / 2;
final int l = n * k1 / k;
final int p = randomizedSelect(s, i, j, l);
findQuantilesAux(s, i, i + l - 1, k1);
System.out.println(p);
findQuantilesAux(s, i + l, j, k - k1);
}
}

Informal Proof for running time

We have log2k+1 levels in our tree, each level except the leaf costs us O(n) time for partitioning all the subarrays at that level. At the leaf level, no partitioning is performed and we merely return when k = 1, which is the base case of our recursion, doing some constant amount of work for each subproblem of size nk or nk. Thus, the total running time of this approach is nlog2k+knlog2k.


Formal Proof for running time

Let's write the recurrence first. 

T(n,k)=T(n1,k2)+T(n2,k2)+Θ(n)

We let T(n,k)cnlog2k for some constant c>0

Let's solve the recurrence using the substitution method.

T(n,k)cn1log2(k2)+cn2log2(k2)+Θ(n)

T(n,k)cn1log2(k2)+cn2log2(k2)+Θ(n)

T(n,k)clog2(k2)(n1+n2)+Θ(n)

T(n,k)=cnlog2(k2)+Θ(n)

T(n,k)=cnlog2kcn+Θ(n)

T(n,k)cnlog2k

where the last step holds when we have some constant c>0 for which the term cn dominates the anonymous function hidden behind the term Θ(n). Thus, the running time is given by

T(n,k)=O(nlog2k)

and that concludes the proof.


References

[1] https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X

[2] https://en.wikipedia.org/wiki/Quantile

Comments

Popular posts from this blog

Introducing Java Reactive Extentions in to a SpringBoot Micro Service

Combining the emissions of multiple Observables together using RxJava Zip operator in a Spring Boot Micro service

RabbitMQ Transport in WSO2 ESB