Understand the concept of proportional sampling and implement proportional sampling using Python
Imagine you are given a set of elements and asked to pick a sample of them, such that the probability of picking an element is proportional to the value of the element.
didn’t understand? let’s go with an example.
let’s say, you are given a set of 6 elements, lets call this array as d.
17, 22.49, 30, 31, 47, 99.38
if someone is asking you to pick a random element from the above set, then all elements are equiprobable. What if I wanted the fifth element is much more likely to be picked up than the fourth element. Similarly, the Probability of picking up the fourth element is much more likely to be picked up than the third element, and so on…
Step 1: Compute the sum of all elements:
Step 2: Divide each element with s, i.e. normalizing each element using the sum of all elements in the set.
such for all elements it will be
So, the values I get after normalized values are
Normalized values have nice property such as their sum equal to 1, and they are between 0 and 1.
Step 3: Compute Cumulative sum:
we will be using these cumulative normalized values while achieving the proportional sampling.
Step 4: let's pick any value between 0 and 1, preferably from the uniform distribution which lies in between 0 and 1, call it r with a value of 0.6
we apply a logic such as
in our case as the value of r is 0.6, we pick the last element i.e 6th element. you must be wondering why is this working, if we deep-dive, we can say that probability of picking up the 4th element is nothing but, probability of r lying between d3t(Tilda) and d4t, which is proportional to d4', which is directly proportional to d4.
Watch out this space for python implementation.