## Data structure for weighted sampling

code, ramblings March 9th, 2009I was asked the following question: Design a data structure that supports 2 operations:

- insert(element, weight)
- sample()

When you insert an element into the data structure, you specify a weight. This weight divided by the total of the other weights in the data structure, is the probability that the item is returned when sample() is invoked. You should design your data structure with both speed and space in mind.

Here is a good solution. The insert() operation requires O(1) constant amortized time and the sample() operation requires O(lg n) time due to binary search. The space requirements is O(n) as well.

public final class BinarySearchWeightedSampler<E> { private final List<E> elements; private final List<Double> weights; public BinarySearchWeightedSampler() { this.elements = new ArrayList<E>(); this.weights = new ArrayList<Double>(); } public void insert(E element, double weight) { elements.add(element); if (weights.isEmpty()) weights.add(weight); else weights.add(weight + weights.get(weights.size() - 1)); } public E sample() { return elements.get( binarySearch(Math.random(), 0, weights.size() - 1, weights.get(weights.size() - 1))); } private int binarySearch(double val, int low, int high, double totalWeight) { if (high < low) return low; int mid = low + ((high - low) / 2); int midValue = weights.get(mid) / totalWeight; if (midValue > val) return binarySearch(val, low, mid - 1, totalWeight); else if (midValue < val) return binarySearch(val, mid + 1, high, totalWeight); else return mid; } }

If we ignore the space requirements and assume the weights have finite precision (I chose integers for clarity, but any finite precision number can be made to work), we can sample in O(1) constant time:

public final class DiscreteWeightedSampler<E> { private final List<E> elements; public DiscreteWeightedSampler() { this.elements = new ArrayList<E>(); } public void insert(E element, int weight) { for (int i = 0; i < weight; ++i) { elements.add(element); } } public E sample() { return elements.get((int) (Math.random() * elements.size())); } }

## Recent Comments

Spencer: Do you know how often Captcha appears per day on the web?...Quincy: Just a suggestion, I think people may be annoyed at having t...Syp: What a neat workshop and congrats on doing the keynote talk!...Aruna: The concept of video captcha is well implemented. Congrats!...Aruna: There is a moving 3-D captcha called Emergence Technology co...