Fork me on GitHub

biom-format.org

biom.table.Table.from_hdf5

«  biom.table.Table.filter   ::   Contents   ::   biom.table.Table.from_json  »

biom.table.Table.from_hdf5

classmethod Table.from_hdf5(h5grp, ids=None, axis='sample', parse_fs=None)

Parse an HDF5 formatted BIOM table

If ids is provided, only the samples/observations listed in ids (depending on the value of axis) will be loaded

The expected structure of this group is below. A few basic definitions, N is the number of observations and M is the number of samples. Data are stored in both compressed sparse row (for observation oriented operations) and compressed sparse column (for sample oriented operations).

Parameters:

h5grp : a h5py Group or an open h5py File

ids : iterable

The sample/observation ids of the samples/observations that we need to retrieve from the hdf5 biom table

axis : {‘sample’, ‘observation’}, optional

The axis to subset on

parse_fs : dict, optional

Specify custom parsing functions for metadata fields. This dict is expected to be {‘metadata_field’: function}, where the function signature is (object) corresponding to a single row in the associated metadata dataset. The return from this function an object as well, and is the parsed representation of the metadata.

Returns:

biom.Table

A BIOM Table object

Raises:

ValueError

If ids are not a subset of the samples or observations ids present in the hdf5 biom table

See also

Table.to_hdf5

Notes

The expected HDF5 group structure is below. An example of an HDF5 file in DDL can be found here [R9].

  • ./id : str, an arbitrary ID
  • ./type : str, the table type (e.g, OTU table)
  • ./format-url : str, a URL that describes the format
  • ./format-version : two element tuple of int32, major and minor
  • ./generated-by : str, what generated this file
  • ./creation-date : str, ISO format
  • ./shape : two element tuple of int32, N by M
  • ./nnz : int32 or int64, number of non zero elems
  • ./observation : Group
  • ./observation/ids : (N,) dataset of str or vlen str
  • ./observation/matrix : Group
  • ./observation/matrix/data : (nnz,) dataset of float64
  • ./observation/matrix/indices : (nnz,) dataset of int32
  • ./observation/matrix/indptr : (M+1,) dataset of int32
  • ./observation/metadata : Group
  • [./observation/metadata/foo] : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.
  • ./observation/group-metadata : Group
  • [./observation/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
  • [./observation/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)
  • ./sample : Group
  • ./sample/ids : (M,) dataset of str or vlen str
  • ./sample/matrix : Group
  • ./sample/matrix/data : (nnz,) dataset of float64
  • ./sample/matrix/indices : (nnz,) dataset of int32
  • ./sample/matrix/indptr : (N+1,) dataset of int32
  • ./sample/metadata : Group
  • [./sample/metadata/foo] : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.
  • ./sample/group-metadata : Group
  • [./sample/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
  • [./sample/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)

The ‘?’ character on the dataset size means that it can be of arbitrary length.

The expected structure for each of the metadata datasets is a list of atomic type objects (int, float, str, ...), where the index order of the list corresponds to the index order of the relevant axis IDs. Special metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:

  • taxonomy: (N, ?) dataset of str or vlen str
  • KEGG_Pathways: (N, ?) dataset of str or vlen str
  • collapsed_ids: (N, ?) dataset of str or vlen str

References

[R7]http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csr_matrix.html
[R8]http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csc_matrix.html
[R9](1, 2) http://biom-format.org/documentation/format_versions/biom-2.0.html

Examples

>>> from biom.table import Table
>>> from biom.util import biom_open
>>> with biom_open('rich_sparse_otu_table_hdf5.biom') as f 
>>>     t = Table.from_hdf5(f) 

Parse a hdf5 biom table subsetting observations >>> from biom.util import biom_open # doctest: +SKIP >>> from biom.parse import parse_biom_table >>> with biom_open(‘rich_sparse_otu_table_hdf5.biom’) as f # doctest: +SKIP >>> t = Table.from_hdf5(f, ids=[“GG_OTU_1”], ... axis=’observation’) # doctest: +SKIP

«  biom.table.Table.filter   ::   Contents   ::   biom.table.Table.from_json  »