|Neptune: Programming and Runtime Support for Cluster-based Network
University of California at Santa Barbara
This project studies infrastructural software that provides programming and runtime support for service clustering, replication, and aggregation on large server clusters with incremental scalability, 24x7 availability, and high manageability. The research issues addressed in Neptune include replication consistency, fault tolerance, load balancing, service differentiation, data aggregation, and cluster resource management.
The Neptune Clustering Software: The software provides a programming API and runtime support, which allows a network service to be programmed quickly for execution on a large-scale cluster in handling high-volume user traffic. The system shields application programmers from the complexities of replication, service discovery, failure detection and recovery, load balancing, resource monitoring and management.
Neptune software has been deployed in www.ask.com and www.teoma.com since 2001 for programming and management of their giant service clusters.
Service Replication: Neptune studies the clustering of replicated network services when the persistent service data is frequently updated. It provides a flexible interface to glue and replicate existing service modules and accommodate a variety of underlying storage mechanisms. Neptune maintains dynamic and location-transparent mapping to isolate faulty service modules and enforce replica consistency. The paper in USITS'01 presents Neptune's initial overall architecture and data replication support, and illustrates its performance using three network services.
Fine-grain Service Load-balancing: Load balancing on a cluster of machines has been studied extensively in the literature, mainly focusing on coarse-grain distributed computation. Fine-grain services introduce additional challenges because system states fluctuate rapidly for those services and system performance is highly sensitive to various overhead. Through simulations and a prototype system implementation in a Linux cluster, our study concludes that: 1) Random polling based load-balancing policies are well-suited for fine-grain network services; 2) A small poll size provides sufficient information for load balancing, while an excessively large poll size may in fact degrade the performance due to polling overhead; 3) Discarding slow-responding polls can further improve system performance.
Service Differentiation: Quality of service (QoS) support that provides customized service qualities to multiple classes of client requests can effectively utilize available system resources. Our resource management framework strives to achieve four goals: 1) The framework provides a flexible mechanism for service providers to specify a variety of desired service quality metrics through QoS yield functions. 2) The framework provisions service differentiation and admission control for multiple classes of service accesses by employing both the yield functions and resource allocation guarantees. 3) The framework achieves efficient resource utilization by producing high aggregate QoS yield through greedy scheduling and speculative admission control that drops zero or low yield requests. 4) The design of this resource management framework takes into consideration service scalability and availability by deploying a decentralized two-level request distribution and scheduling scheme.
Scalable Data Aggregation: Large-scale cluster-based Internet services often host partitioned datasets to provide incremental scalability. The aggregation of results produced from multiple partitions is a fundamental building block for the delivery of these services. We design and implement a programming primitive -- Data Aggregation Call (DAC) -- to exploit partition parallelism for cluster-based Internet services in Neptune. Our architecture design aims at improving interactive responses with sustained throughput for typical cluster environments where platform heterogeneity and software/hardware failures are common. At the cluster level, our load-adaptive reduction tree construction algorithm balances processing and aggregation load across servers while exploiting partition parallelism. Inside each node, we employ an event-driven thread pool design that prevents slow nodes from adversely affecting system throughput under highly concurrent workload.
Neptune source code with a sample service (version 1.0)
Slides on the use of Neptune for query processing in search engines
This material is based upon work supported by the National Science Foundation under Grant No. EIA-0080134, CCR-9702640, ACIR-0082666, and 0086061. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Our research on web caching and service differentiation.