Course: Big Data Open Source Software and Projects (Data Science Curriculum)


This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~150 software subsystems illustrated at
We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).

The course covers the following material
a) The cloud computing architecture underlying ABDS and
contrast of this with HPC.
b) The software architecture with its different layers at covering broad functionality and rationale for each layer.
c) We will give application examples
d) Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureGrid systems using OpenStack and Chef recipes.
e) Students will chose one other open source member of Kaleidoscope each and deploy as in d).
f) The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
g) Teams of up to 3 students can be formed with corresponding increase in scope in activities e), f)

Intellectual Merit

One of main data science classes being offerred for first time Fall 2014 with online and residential sections

Broader Impact

Our MOOC style ensures broad impact

Use of FutureGrid

For student projects

Scale Of Use

Modest as class


Geoffrey Fox
Sidd Maini
Indiana University

Project Members

Abhik Seal
Amritanshu Joshi
Anesu Chaora
Aravindh Varadharaju
Fazle Rabbi
Fugang Wang
Gregor von Laszewski
Harsh Seth
Hyungro Lee
Ian Wood
Karthik Mohandas Bangera
Naveen Madhire
Priyank Kabaria
Pushkar Raj
Rahul Singhania
Rakesh Menon
satwik narlanka
Scott McCaulay
Sidd Maini
Siddhardha Raju Mandapati
Sriram Pulipaka
William k.
Yukai Xiao


1 week 4 days ago