The evolution of microbial symbionts and their biosynthetic pathways through shotgun metagenomics


Symbiotic relationships between bacteria and eukaryotes are prevalent in nature.  The most fundamental of these symbiotic events resulted in modern eukaryotic cells containing mitochondria and chloroplasts descended from ancient endosymbionts.  These organelles are the end-points of symbiont evolution - they have extremely reduced and degraded genomes, and many of their vital functions are controlled and orchestrated by the host.  Although snapshots of intermediate stages of this reductive evolutionary process have been obtained through genomics of various insect symbionts, our picture of symbiont evolution is still incomplete.  We will be investigating symbiont evolution by examining various symbiotic systems from the marine environment, where symbionts are often implicated in the production of chemical defenses for sessile or ortherwise vulnerable invertebrates.  Because these symbionts are unculturable, their genomes will be assembled from environmental DNA obtained directly from their hosts.  Future Grid resources will be used to assemble such shotgun metagenomic sequence data.  Because of the complexity of these samples (i.e. they contain genomic DNA from a large number of different species), large amounts of memory and CPU power are required for assembly.

Intellectual Merit

This work will give us access to genomic information on symbionts otherwise inaccessible (due to the amount of compute power required), which will ultimately allow us to examine symbiont evolution by comparative genomics. Because we will study systems involved in production of chemical defenses, we will gain an understanding of the evolution of natural product biosynthetic pathways in endosymbionts, as well as general symbiont evolution.

Broader Impact

During the course of our work, we will further refine methods for assembling and deconvoluting complex shotgun metagenomic datasets, which will be applicable to a number of scientific endeavors that increasingly use this type of data. We are collaborating with Grace Lim-Fong of Randolph-Macon College, Virginia, an undergraduate-only institution. During the course of this project, we will host undergraduates from RMC to do research projects in bioinformatics. The specialized and general skills they will learn in HPC, linux command line and bioinformatics will be excellent training for their subsequent careers in any science or engineering discipline that involved the manipulation of big data.

Use of FutureGrid

We will use Future Grid to assemble large metagenomic DNA sequence datasets with Velvet ( We will also do ancillary tasks in refining genome assemblies, such as carrying out large-scale BLAST searches.

Scale Of Use

Each assembly run will require a single node with a large amount of memory (~200 GB), and typically takes 3-8 hours. In the past I have done similar runs on delta and bravo in project 149. Large scale blast searches will use multiple nodes, but will require less memory.


Jason Kwan
University of Wisconsin-Madison

Project Members

Ian Miller
Theodore Weyna


39 weeks 5 days ago