digplanet beta 1: Athena
Share digplanet:

Agriculture

Applied sciences

Arts

Belief

Business

Chronology

Culture

Education

Environment

Geography

Health

History

Humanities

Language

Law

Life

Mathematics

Nature

People

Politics

Science

Society

Technology

Reservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list S containing n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn't fit into main memory. The most common example was labelled Algorithm R by Jeffrey Vitter in his paper on the subject.[1]

This simple O(n) algorithm as described in the Dictionary of Algorithms and Data Structures[2] consists of the following steps (assuming that the arrays are one-based, and that the number of items to select, k, is smaller than the size of the source array, S):

array R[k];    // result
integer i, j;

// fill the reservoir array
for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done

Contents

Relation to Fisher-Yates shuffle [edit]

Suppose one wanted to draw k random cards from a deck of playing cards (i.e., n=52). A natural approach would be to shuffle the deck and then take the top k cards. In the general case, the shuffle also needs to work even if the number of cards in the deck is not known in advance, a condition which is satisfied by the inside-out version of the Fisher-Yates shuffle:

To initialize an array a of n elements to a randomly shuffled copy of S, both 0-based:
   a[0] ← S[0]
   for i from 1 to n - 1 do
       rrandom (0 .. i)
       a[i] ← a[r]
       a[r] ← S[i]

Note that although the rest of the cards are shuffled, only the top k are important in the present context. Therefore, the array a need only track the cards in the top k positions while performing the shuffle, reducing the amount of memory needed. Truncating a to length k, the algorithm is modified accordingly:

To initialize an array a to k random elements of S (which is of length n), both 0-based:
   a[0] ← S[0]
   for i from 1 to k - 1 do
       rrandom (0 .. i)
       a[i] ← a[r]
       a[r] ← S[i]

   for i from k to n - 1 do
       rrandom (0 .. i)

       if (r < k) then a[r] ← S[i]

Since the order of the first k cards is immaterial, the first loop can be removed and a can be initialized to be the first k items of S. This yields Algorithm R.

Example implementation [edit]

The following is a simple implementation of the algorithm in Python that samples the set of English Wikipedia page titles:

import random
SAMPLE_COUNT = 10
 
# Force the value of the seed so the results are repeatable
random.seed(12345)
 
sample_titles = []
for index, line in enumerate(open("enwiki-20091103-all-titles-in-ns0")):
        # Generate the reservoir
        if index < SAMPLE_COUNT:
                sample_titles.append(line)
        else:                  
                # Randomly replace elements in the reservoir
                # with a decreasing probability.             
                # Choose an integer between 0 and index (inclusive)               
                r = random.randint(0, index)               
                if r < SAMPLE_COUNT:                       
                        sample_titles[r] = line
print sample_titles

See also [edit]

References [edit]


Original courtesy of Wikipedia: http://en.wikipedia.org/wiki/Reservoir_sampling — Please support Wikipedia.
A portion of the proceeds from advertising on Digplanet goes to supporting Wikipedia.
452 videos foundNext > 

Lec-1 Introduction to Linear Programming Formulations

Lecture Series on Fundamentals of Operations Research by Prof.G.Srinivasan, Department of Management Studies, IIT Madras. For more details on NPTEL visit htt...

Water Sampling at the Reservoir.mov

Water Sampling at Reservoir Fall 2011.

Sampling - Reservoir.mpg

A Sample Of Water Sampling

Ever wonder how we test our drinking water? Step on board the boat with United Water as we take samples from a New Jersey reservoir.

TVA Surveys Chickamauga Fish Populations

Fishing enthusiasts interested in the health of sport fish in the Tennessee Valley Authority€™s reservoirs along the Tennessee River are invited to observe t...

TRIBOMAR - TriboVAC Sampling Pump !

How to get a representative oil sample from your machinery / oil reservoir. This movie explains in an easy way the handling of a so called "vampire pump" to ...

Armada

visit us for more videos http://pet-oil.blogspot.com/search/label/Vedios tubing conveyed carrier for single phase downhole sampling one or more temperature g...

Suspended stream sediment sampling | Analysing the sediments

BGS Organic Geochemist, Dr Chris Vane, explains the next steps in analysing the sediments collected from our suspended stream sediment sampling device. Eight...

Sampling and harvesting the WHEATON CELLine flask

This video shows how to take a sample and collect the contents of the cell chamber in a WHEATON CELLine flask. The cell compartment is separated from the bul...

Ecology Fieldwork: Sampling the salinas in France, ECOSAL Atlantis research project (March 2011)

To characterise the invertebrate communities in salinas across the ECOSAL Atlantic area, fieldwork is being carried out at sites across all partner regions. ...

452 videos foundNext > 

We're sorry, but there's no news about "Reservoir sampling" right now.

Loading

Oops, we seem to be having trouble contacting Twitter

Talk About Reservoir sampling

You can talk about Reservoir sampling with people all over the world in our discussions.

Support Wikipedia

A portion of the proceeds from advertising on Digplanet goes to supporting Wikipedia. Please add your support for Wikipedia!