Metagenome analysis of benthic marine invertebrates


We are carrying out deep sequencing of environmental DNA from benthic marine organisms that are important components of their community but that have not been extensively examined genomically. In these organisms, symbiotic bacteria are demonstrably critical to host survival. The metagenomes are extremely complex, yet robust assemblies can sometimes be achieved. These properties make benthic marine invertebrates excellent models for NGS technology. In this project, we will use Future Grid resources to carry out de novo assembly of marine invertebrate metagenomic sequence data, a process that requires large amounts of memory and CPU power due the volume of data.

Intellectual Merit

This work will help determine the potential utility of NGS technology, which produces a large amount of data but as relatively short reads, in metagenomics.

Broader Impact

In the course of our work we will determine the practical aspects of processing large and complex Illumina sequencing data to obtain de novo genome assemblies of very minor members of the metagenome. This will be of great use to the metagenomics community.

Use of FutureGrid

Future Grid will be used for de novo assembly of metagenomic sequence data generated by Illumina technology. FG will also be used for the analysis of the assembled data - including automatic annotation and large scale BLAST searches

Scale Of Use

Assemblies using the program Meta-Velvet require a single node with a large amount of memory (~150 GB). Ideally we would be able to SSH into a single node to run the assembly. Long-term we may explore more distributed workflows.


We have been able to successfully assemble the complete genome of a previously unknown endosymbiotic bacterium from metagenomic sequence data obtained from a marine invertebrate (even though the bacterium only accounted for ~0.6% of the data). The complete genome afforded many insights into the symbiotic relationship, which we have reported in a paper published in Proceedings of the National Academy of Sciences. The insights gained in this effort have allowed us to develop new methods in data processing and assembly which we are currently refining and will be the subject of future publications.  We will continue to use Future Grid in these efforts to gain insight into other symbiotic systems. The scientific broad impact of this work is twofold. First, these symbiotic relationships are a key, yet poorly understood aspect of coral reef biodiversity. Second, these symbioses lead to the production of bioactive small molecules. By understanding the origin of compounds, we are developing new methods to tap biodiversity for potential application in medicine, agriculture, and other areas.
Malcolm Zachariah
University of Utah

Project Members

Ashaimaa Moussa
Diarey Tianero
Earl Middlebrook
Jason Kwan
Malcolm Zachariah
Russell Green
Thomas Kakule
Thomas Waller
Zhejian Lin

FutureGrid Experts

Bingjing Zhang