Modern computer systems are called on to deal with billions
of events every second, whether they are instructions
executed, memory locations accessed, or packets forwarded.
This presents a serious challenge to those who seek to quantify,
analyze, or optimize such systems, because important
trends and behaviors may easily be lost in a sea of data. We
present Range Adaptive Profiling (RAP) as a new and general
purpose profiling method capable of hierarchically classifying
streams of data efficiently in hardware. Through the use
of RAP, events in an input stream are dynamically classified
into increasingly precise categories based on the frequency
with which they occur. The more important a class, or range
of events, the more precisely it is quantified.
|
|
Despite the dynamic nature of our technique, we build upon
tight theoretic bounds covering both worst-case error as well
as the required memory. In the limit, it is known that error and
the memory bounds can be independent of the stream size,
and grow only linearly with the level of precision desired.
Significantly, we expose the critical constants in these algorithms
and through careful engineering, algorithm re-design,
and use of heuristics, we show how a high performance profile
system can be implemented for Range Adaptive Profiling.
RAP can be used on various profiles such as PCs, load values,
and memory addresses, and has a broad range of uses, from
hot-region profiling to quantifying cache miss value locality.
We propose two methods of implementation, one in software
and the other with specialized hardware, and we show that
with just 8k bytes of memory range profiles can be gathered
with an average accuracy of 98%.
|
|
|
|
Software Range Profiling
|
|
The C++ implementation has an API with three methods
rap_init(), rap_add_points(), and rap_finalize()
which can
either be called from online analysis or to post process trace files.
rap_init initializes the RAP tree with an initial set of counters and appropriate range values. rap_init also initializes data structures to enable
profiling multiple events simultaneously.
The RAP tree is a dynamically allocated tree and rap_add_points looks up
the appropriate counter, updates the counter, and when needed calls the internal
functions rap_split() and rap_merge(). The post processing phase of
deriving statistical inferences about the stream can be done through
rap_finalize, it also dumps the resulting RAP tree in ascii format for
further processing such as identifying hot-spots, range coverage, phase identification,
and so on.
|
|
|
|
Software
|
An initial version of the software can be downloaded
here.
|
|
|
|
Publication
|