Fork me on GitHub

biom-format.org

biom.table.Table.subsample

«  biom.table.Table.sort_order   ::   Contents   ::   biom.table.Table.sum  »

biom.table.Table.subsample

Table.subsample(n, axis='sample', by_id=False)

Randomly subsample without replacement.

Parameters:

n : int

Number of items to subsample from counts.

axis : {‘sample’, ‘observation’}, optional

The axis to sample over

by_id : boolean, optional

If False, the subsampling is based on the counts contained in the matrix (e.g., rarefaction). If True, the subsampling is based on the IDs (e.g., fetch a random subset of samples). Default is False.

Returns:

biom.Table

A subsampled version of self

Raises:

ValueError

If n is less than zero.

Notes

Subsampling is performed without replacement. If n is greater than the sum of a given vector, that vector is omitted from the result.

Adapted from skbio.math.subsample, see biom-format/licenses for more information about scikit-bio.

This code assumes absolute abundance if by_id is False.

Examples

>>> import numpy as np
>>> from biom.table import Table
>>> table = Table(np.array([[0, 2, 3], [1, 0, 2]]), ['O1', 'O2'],
...               ['S1', 'S2', 'S3'])

Subsample 1 item over the sample axis by value (e.g., rarefaction):

>>> print table.subsample(1).sum(axis='sample')
[ 1.  1.  1.]

Subsample 2 items over the sample axis, note that ‘S1’ is filtered out:

>>> ss = table.subsample(2)
>>> print ss.sum(axis='sample')
[ 2.  2.]
>>> print ss.ids()
['S2' 'S3']

Subsample by IDs over the sample axis. For this example, we’re going to randomly select 2 samples and do this 100 times, and then print out the set of IDs observed.

>>> ids = set([tuple(table.subsample(2, by_id=True).ids())
...            for i in range(100)])
>>> print sorted(ids)
[('S1', 'S2'), ('S1', 'S3'), ('S2', 'S3')]

«  biom.table.Table.sort_order   ::   Contents   ::   biom.table.Table.sum  »