De novo assembly of genomes and metagenomes from next generation sequencing data


We will use the FutureGrid computing resource to assemble next-generation sequencing (NGS) reads from eukaryotic genome projects and metagenome project, including the human microbiome project and the earth microbiome project. The massive sequencing data generated by NGS sequencers have revolutionized many fields of biology, but requires extensive computing resources to be analyzed. In particular, we would like to utilize the computer clusters with large continuous RAM from the FutureGird project to test some assembly algorithms we developed for NGS data and to analyze large datasets from microbiome projects that may lead to new findings.

Intellectual Merit

Because of the nature of the large dataset, it is very time consuming to test and improve assembly algorithms for NGS data. FutureGrid resources provide a unique opportunity to test them on real large datasets. The results will be very valuable for the genomics community to develop and improve assembly algorithms.

Broader Impact

NGS techniques have been applied to many different topics, ranging from biology to environmental sciences and new energy.
The success of the proposed project will have great impact in these application areas.

Use of FutureGrid

We will run the assembly algorithms we developed on large datasets from microbiome projects.

Scale Of Use

We need to use computer nodes with large RAM for a week.


Haixu Tang
Indiana University

Project Members

Gregory Zynda
Heewook Lee
Mina Rho
Mingjie Wang
Ram Podicheti


2 years 23 weeks ago