Metagenomics

Tools and methods for the analysis of metagenomics data and their integration with other -omic technologies. 

Metagenomics

Microbes are by far the most common life form on earth, colonizing every environment from thermal springs to soil, water, plants, animals, and the human body. Thanks to the advent of high throughput sequencing technologies, we are now starting to characterize the enormous biodiversity of microbial communities (the microbiome) and understand the profound influences that they have on virtually every ecosystem, with applications that range from agricultural practices to human health.

The emergence of other -omic technologies (metabolomics, metatranscriptomics, etc.) is continuously increasing our understanding of the microbiome and of its interactions with the other forms of life, at the same time posing challenges of unprecedented complexity to statisticians, data analysts and computational scientists.

The Computational Biology Unit develops tools and methods for the analysis of metagenomics data and their integration with other -omic technologies. The main research areas are:

  • Amplicon sequencing. The micca software suite is a simple, self-contained suite for the processing of amplicon sequencing data from bacteria (16S) and fungi (ITS)
  • Development of strain-level methods for the analysis of whole genome sequencing data of microbial communities from WGS data. Using the genomic data available, the StrainEst tool is able to identify and quantify the different strains of bacterial species of interests.
  • Machine Learning approaches. PhyloRelief exploits the phylogenetic relationships amongst taxa to identify the portions of the tree of life that are differentially distributed in case-control studies.
  • Data integration. Multiple omics can provide a multifaceted view of microbial communities, characterizing at the same time, their structure, function and metabolic activity. However, the computational methods for the integration of the different data are still in their infancy. We are working to develop tools for data integration and the identification of correlations amongst the different datasets, like the latest MICtools suite.