Improve resource utilization in MapReduce


Hadoop partitions physical resources into conceptual map and reduce slots to control the maximum number of tasks that can concurrently run on each slave node. We observed that this mechanism can result in low resource utilization when not all task slots on a node are used. In this project, we propose a new mechanism called resource stealing to increase resource utilization. In addition, the default mechanism to trigger speculative execution may incur the execution of many non-beneficial speculative tasks that are killed before completion. In this project, we propose Benefit Aware Speculative Execution (BASE) which reduces the number of non-beneficial speculative tasks without sacrificing performance.

Intellectual Merit

This project addresses the inefficiencies of Hadoop. Our proposed resource stealing increases resource utilization without interfering with normal Hadoop task scheduling. In addition, our proposed Benefit Aware Speculative Execution (BASE) can eliminate most of the non-beneficial speculative tasks without degrading performance.

Broader Impact

MapReduce/Hadoop has been used by both industry and academia to run large-scale data processing applications. The proposed approaches evaluated in this project increase resource utilization, which can improve throughput. It enables users to run MapReduce jobs more efficiently, and therefore reduces job run time. So the productivity of scientists is increased because they can get results faster and tune their applications accordingly.

Use of FutureGrid

We used the High-Performance Computing (HPC) environments provided by FutureGrid to run experiments to evaluate our proposed approaches.

Scale Of Use

We used 20 - 40 of bare metal machines on a periodic basis.



We ran CPU-, IO-, and network-intensive applications to evaluate our algorithms. The results show resource stealing can achieve higher resource utilization and thus reduce job run time. Our BASE optimization reduces the number of non-beneficial speculative tasks significantly without incurring performance degradation. 
The detailed results of this project are presented in our paper "Improving Resource Utilization in MapReduce" [1].


  1. [ResStealAndBASE] Guo, Z., G. Fox, M. Zhou, and Y. Ruan, "Improving Resource Utilization in MapReduce", the 2012 IEEE International Conference on Cluster Computing, Beijing, China, IEEE Computer Society, 2012.
Zhenhua Guo
Indiana University


2 years 5 weeks ago
1 year 28 weeks ago