A Scalable, Robust Network for Parallel Computing


Peter Cappello and Dimitrios Mourloukos

Abstract

CX, a network-based computational exchange, is presented. The system's design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, eager scheduling, and space-based coordination. The object-oriented API is simple, compact, and cleanly separates application logic from the logic that supports interprocess communication and fault tolerance. Computations, of course, run to completion in the presence of computational hosts that join and leave the ongoing computation. Such hosts, or producers, use task caching and prefetching to overlap computation with interprocessor communication. To break a potential task server bottleneck, a network of task servers is presented. Even though task servers are envisioned as reliable, the self-organizing, scalable network of n servers, described as a sibling-connected fat tree, tolerates a sequence of n-1 server failures. Tasks are distributed throughout the server network via a simple "diffusion" process. CX is intended as a test bed for research on automated silent auctions, reputation services, authentication services, and bonding services. CX also provides a test bed for algorithm research into network-based parallel computation.

BibTex

@inproceedings{JavaGrande2001CM,
    author    = {Peter Cappello and Dimitrios Mourloukos},
    title     = {{A Scalable, Robust Network for Parallel Computing}},
    booktitle = {Proc. Joint ACM Java Grande/ISCOPE Conference},
    year      = {2001},
    pages     = {78 - 86},
    month     = {June},
}

Full version in PDF

Presentation in PPT