Global Optimization for Mapping Parallel Image Processing
Tasks on Distributed Memory Machines
C. H. Lee, Y. F. Wang, and Tao Yang
Department of Computer Science
University of California
Santa Barbara, CA 93106
Many parallel algorithms and library routines are available for
performing computer vision and image processing (CVIP) tasks on
distributed-memory multiprocessors. The typical image distribution may
use column, row, and block based mapping. Integrating a set of library
routines for a CVIP application requires a global optimization for
determining the data mapping of individual tasks by considering
inter-task communication. The main difficulty in deriving the optimal
image data distribution for each task is that CVIP task computation may
involve loops, and the number of processors available and the size of
the input image may vary at the run time. In this paper, a CVIP
application is modeled using a task chain with imperfectly nested
loops, specified by conventional visual languages such as Khoros
and Explorer. A mapping algorithm is proposed that optimizes the
average run-time performance for CVIP applications with nested loops by
considering the data redistribution overheads and possible run-time
parameter variations. A taxonomy of CVIP operations is provided and
used for further reducing the complexity of the algorithm.
Experimental results on both low-level image processing and high-level
computer vision applications are presented to validate this approach.