biom.table.Table.to_hdf5¶

Table.to_hdf5(h5grp, generated_by, compress=True, format_fs=None, creation_date=None)¶

Store CSC and CSR in place

The resulting structure of this group is below. A few basic definitions, N is the number of observations and M is the number of samples. Data are stored in both compressed sparse row [1] (CSR, for observation oriented operations) and compressed sparse column [2] (CSC, for sample oriented operations).

Parameters:

h5grph5py.Group or h5py.File: The HDF5 entity in which to write the BIOM formatted data.
generated_bystr: A description of what generated the table
compressbool, optional: Defaults to True means fields will be compressed with gzip, False means no compression
format_fsdict, optional: Specify custom formatting functions for metadata fields. This dict is expected to be {‘metadata_field’: function}, where the function signature is (h5py.Group, str, dict, bool) corresponding to the specific HDF5 group the metadata dataset will be associated with, the category being operated on, the metadata for the entire axis being operated on, and whether to enable compression on the dataset. Anything returned by this function is ignored.
creation_datedatetime, optional: If provided, use this specific datetime on write as the creation timestamp

See also

Table.from_hdf5

Notes

This method does not return anything and operates in place on h5grp.

The expected HDF5 group structure is below. An example of an HDF5 file in DDL can be found here [3].

./id : str, an arbitrary ID
./type : str, the table type (e.g, OTU table)
./format-url : str, a URL that describes the format
./format-version : two element tuple of int32, major and minor
./generated-by : str, what generated this file
./creation-date : str, ISO format
./shape : two element tuple of int32, N by M
./nnz : int32 or int64, number of non zero elems
./observation : Group
./observation/ids : (N,) dataset of str or vlen str
./observation/matrix : Group
./observation/matrix/data : (nnz,) dataset of float64
./observation/matrix/indices : (nnz,) dataset of int32
./observation/matrix/indptr : (M+1,) dataset of int32
./observation/metadata : Group
[./observation/metadata/foo] : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.
./observation/group-metadata : Group
[./observation/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
[./observation/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)
./sample : Group
./sample/ids : (M,) dataset of str or vlen str
./sample/matrix : Group
./sample/matrix/data : (nnz,) dataset of float64
./sample/matrix/indices : (nnz,) dataset of int32
./sample/matrix/indptr : (N+1,) dataset of int32
./sample/metadata : Group
[./sample/metadata/foo] : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.
./sample/group-metadata : Group
[./sample/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs
[./sample/group-metadata/foo.attrs[‘data_type’]] : attribute of the foo dataset that describes contained type (e.g., newick)

The ‘?’ character on the dataset size means that it can be of arbitrary length.

The expected structure for each of the metadata datasets is a list of atomic type objects (int, float, str, …), where the index order of the list corresponds to the index order of the relevant axis IDs. Special metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:

taxonomy: (N, ?) dataset of str or vlen str
KEGG_Pathways: (N, ?) dataset of str or vlen str
collapsed_ids: (N, ?) dataset of str or vlen str

References

[1]

http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csr_matrix.html

[2]

http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csc_matrix.html

[3]

http://biom-format.org/documentation/format_versions/biom-2.1.html

Examples

>>> from biom.util import biom_open  
>>> from biom.table import Table
>>> from numpy import array
>>> t = Table(array([[1, 2], [3, 4]]), ['a', 'b'], ['x', 'y'])
>>> with biom_open('foo.biom', 'w') as f:  
...     t.to_hdf5(f, "example")