Range Adaptive Profiling

Modern computer systems are called on to deal with billions of events every second, whether they are instructions executed, memory locations accessed, or packets forwarded. This presents a serious challenge to those who seek to quantify, analyze, or optimize such systems, because important trends and behaviors may easily be lost in a sea of data. We present Range Adaptive Profiling (RAP) as a new and general purpose profiling method capable of hierarchically classifying streams of data efficiently in hardware. Through the use of RAP, events in an input stream are dynamically classified into increasingly precise categories based on the frequency with which they occur. The more important a class, or range of events, the more precisely it is quantified.
Despite the dynamic nature of our technique, we build upon tight theoretic bounds covering both worst-case error as well as the required memory. In the limit, it is known that error and the memory bounds can be independent of the stream size, and grow only linearly with the level of precision desired. Significantly, we expose the critical constants in these algorithms and through careful engineering, algorithm re-design, and use of heuristics, we show how a high performance profile system can be implemented for Range Adaptive Profiling. RAP can be used on various profiles such as PCs, load values, and memory addresses, and has a broad range of uses, from hot-region profiling to quantifying cache miss value locality. We propose two methods of implementation, one in software and the other with specialized hardware, and we show that with just 8k bytes of memory range profiles can be gathered with an average accuracy of 98%.
Software Range Profiling
The C++ implementation has an API with three methods rap_init(), rap_add_points(), and rap_finalize() which can either be called from online analysis or to post process trace files. rap_init initializes the RAP tree with an initial set of counters and appropriate range values. rap_init also initializes data structures to enable profiling multiple events simultaneously. The RAP tree is a dynamically allocated tree and rap_add_points looks up the appropriate counter, updates the counter, and when needed calls the internal functions rap_split() and rap_merge(). The post processing phase of deriving statistical inferences about the stream can be done through rap_finalize, it also dumps the resulting RAP tree in ascii format for further processing such as identifying hot-spots, range coverage, phase identification, and so on.
Software
An initial version of the software can be downloaded here.
Publication