kth quantiles of an n-element set
Problem statement
Exercise 9.3-8
The kth quantiles of an n-element set are the k - 1 order statistics that divide the sorted set into k equal sized sets (to within 1). Give an O(n⋅lgk)-time algorithm to list the kth quantiles of a set. [1]
Definitions
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. [2]
Approach
Let's write a divide and conquer algorithm using selection to solve this problem. What if we find the l=⌊n⋅⌊k2⌋k⌋ th smallest number and partition the array around it. Then, we can find the first ⌊k2⌋ cut points in the first half l and the remaining ⌈k2⌉ cut points in the second half n−l. At each step, we keep on dividing k by two until it reaches k = 1 which is the trivial case of our recursion. We first solve the left subproblem recursively, and then print the current solution before solving the right subproblem recursively. What follows is our pseudocode for finding the kth quantiles of an n-element set. A complete treatment of the procedure RANDOMIZED-SELECT used in our algorithm and proof of it's running time is beyond the scope of this article, but the interested reader is directed to the CLRS book [1].
Sample Implementation
Here's a sample Java implementation of the algorithms discussed above along with a driver program.
class FindQuantiles { | |
FindQuantiles() { | |
throw new AssertionError(); | |
} | |
public static void main(String[] args) { | |
final int[] a = { 5, 13, 6, 14, 4, 9, 19, 11, 12, 15, 20, 1, 8, 18, 10, 16, 22, 7, 3, 2, 17, 23, 21 }; | |
final int k = 7; | |
findQuantiles(a, k); | |
} | |
static void findQuantiles(int[] s, int k) { | |
final int n = s.length; | |
findQuantilesAux(s, 0, n - 1, k); | |
} | |
static void findQuantilesAux(int[] s, int i, int j, int k) { | |
if (k == 1) | |
return; | |
final int n = j - i + 1; | |
final int k1 = k / 2; | |
final int l = n * k1 / k; | |
final int p = randomizedSelect(s, i, j, l); | |
findQuantilesAux(s, i, i + l - 1, k1); | |
System.out.println(p); | |
findQuantilesAux(s, i + l, j, k - k1); | |
} | |
} |
Informal Proof for running time
We have log2k+1 levels in our tree, each level except the leaf costs us O(n) time for partitioning all the subarrays at that level. At the leaf level, no partitioning is performed and we merely return when k = 1, which is the base case of our recursion, doing some constant amount of work for each subproblem of size ⌊nk⌋ or ⌈nk⌉. Thus, the total running time of this approach is n⋅log2k+k≈n⋅log2k.
Formal Proof for running time
Let's write the recurrence first.
T(n,k)=T(n1,⌊k2⌋)+T(n2,⌈k2⌉)+Θ(n)
We let T(n,k)≤cn⋅log2k for some constant c>0
Let's solve the recurrence using the substitution method.
T(n,k)≤cn1⋅log2(⌊k2⌋)+cn2⋅log2(⌈k2⌉)+Θ(n)
T(n,k)≤cn1⋅log2(k2)+cn2⋅log2(k2)+Θ(n)
T(n,k)≤c⋅log2(k2)⋅(n1+n2)+Θ(n)
T(n,k)=cn⋅log2(k2)+Θ(n)
T(n,k)=cn⋅log2k−cn+Θ(n)
T(n,k)≤cn⋅log2k
where the last step holds when we have some constant c>0 for which the term cn dominates the anonymous function hidden behind the term Θ(n). Thus, the running time is given by
T(n,k)=O(n⋅log2k)
and that concludes the proof.
References
[1] https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X
Comments
Post a Comment