See Spot Run: Using Spot Instances for MapReduce Workflows
Navraj Chohan, Claris Castillo, Mike Spreitzer, Malgorzata Steinder, Asser Tantawi, and Chandra Krintz
PDF
Abstract
MapReduce is a scalable and fault tolerant framework, patented by Google,
for computing embarrassingly parallel reductions. Hadoop is an open-source
implementation of Google MapReduce that is made available as a web service to
cloud users by the Amazon Web Services (AWS) cloud computing infrastructure.
Amazon Spot Instances (SIs) provide an inexpensive yet transient and market-based option
to purchasing virtualized instances for execution in AWS. As opposed to
manually controlling when an instance is terminated, SI termination can also
occur automatically as a function of the market price and
maximum user bid price. We find that we can
significantly improve the runtime of MapReduce jobs in our benchmarks
by using SIs as accelerators.
However, we also find that SI
termination due to budget constraints during the job can have adverse
affects on the runtime and may cause the user to overpay
for their job. We describe new techniques that help reduce such
effects.