The Biological Observation Matrix (BIOM) format¶
The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.
The BIOM format is designed for general use in broad areas of comparative -omics. For example, in marker-gene surveys, the primary use of this format is to represent OTU tables: the observations in this case are OTUs and the matrix contains counts corresponding to the number of times each OTU is observed in each sample. With respect to metagenome data, this format would be used to represent metagenome tables: the observations in this case might correspond to SEED subsystems, and the matrix would contain counts corresponding to the number of times each subsystem is observed in each metagenome. Similarly, with respect to genome data, this format may be used to represent a set of genomes: the observations in this case again might correspond to SEED subsystems, and the counts would correspond to the number of times each subsystem is observed in each genome.
The BIOM project consists of the following components:
- definition of the BIOM file format;
- command line interface (CLI) for working with BIOM files, including converting between file formats, adding metadata to BIOM files, and summarizing BIOM files (run biom to see the full list of commands);
- application programming interface (API) for working with BIOM files in multiple programming languages (including Python and R).
The biom-format package provides a command line interface and Python API for working with BIOM files. The rest of this site contains details about the BIOM file format (which is independent of the API) and the Python biom-format package. For more details about the R API, please see the CRAN biom package.
Projects using the BIOM format¶
- BIOM Documentation
- The biom file format
- Tips and FAQs regarding the BIOM file format
- Quick start
- BIOM Table (biom.table)
- Converting between file formats
- Adding sample and observation metadata to biom files
- Summarizing BIOM tables
- The BIOM Format License
The latest official version of the biom-format project is 2.1.5 and of the BIOM file format is 2.0. Details on the file format can be found here.
Installing the biom-format Python package¶
To install the latest release of the biom-format Python package:
pip install numpy pip install biom-format
To work with BIOM 2.0+ files:
pip install h5py
To see a list of all biom commands, run:
To enable Bash tab completion of biom commands, add the following line to $HOME/.bashrc (if on Linux) or $HOME/.bash_profile (if on Mac OS X):
eval "$(_BIOM_COMPLETE=source biom)"
Installing the biom R package¶
There is also a BIOM format package for R called biom. This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object, as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly “R flavor” that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods.
To install the latest stable release of the biom package enter the following command from within an R session:
To install the latest development version of the biom package, enter the following lines in an R session:
install.packages("devtools") # if not already installed library("devtools") install_github("biom", "joey711")
Please post any support or feature requests and bugs to the biom issue tracker.
See the biom project on GitHub for further details, or if you would like to contribute.
Note that the licenses between the biom R package (GPL-2) and the biom-format Python package (Modified BSD) are different.
Citing the BIOM project¶
You can cite the BIOM format as follows (link):