Run-time Function Adaptation for Statically-Compiled Programs based on function multiversioning
In order to make statically-compiled programs adaptable to changing inputs, program behavior and environments (architecture, OS, virtual layers, libraries, etc) at run-time, we developed a new concept (presented in the paper G.Fursin et al, "A Practical Method For Quickly Evaluating Program Optimizations", Proceedings of HiPEAC'05, pdf), that relies on static function versioning and dynamic monitoring routines of performance counters. In this technique we produce multiple versions of hot functions, apply combinations of aggressive optimizations for different optimization cases (using GCC-ICI) and then use a run-time low-overhead program phase detection scheme based on monitoring of hardware counters to learn program behavior, associate it with different versions of functions (different optimizations), and then react to changes in program run-time behavior based on this association table. Preserving this table across runs enables continuous adaptation of programs thus making static binaries adaptable to different environments.We also used this technique to speed up the iterative search for different optimization cases.
The first technique presented in HiPEAC'05 conference is easily applicable to scientific programs that have stable program phases but may not work well on programs with irregular run-time behavior. Therefore, we decided to extend our technique based on the Lau et al technique for dynamic JIT compiler (Lau, M. Arnold, M. Hind, and B. Calder. Online performance auditing: Using hot optimizations without getting burned. In Proceedings of PLDI'06). We randomly select code versions (with different optimizations) at run-time, obtain execution time distribution among all versions and statistically determine the influence of compiler optimizations on the code in a single run. The simplicity of the implementation makes this technique reliable, secure and easy to debug. Yet it enables practical transparent low-overhead continuous optimizations for programs statically compiled with GCC while avoiding complex dynamic recompilation frameworks.
Here is our patch from 2007/07/18 patch against GCC 4.3 implementing function cloning and adding dynamic monitoring routines (briefly described in the 2007 GCC Summit paper: "Practical Run-time Adaptation with Procedure Cloning to Enable Continuous Collective Compilation").
We are working to extend our technique with the following and would be happy for any help with developments:
- provide automatic function multiversioning and keep info about hot functions of statically-compiled programs across runs in the collective database
- be able to apply different optimizations on function clones (using ICI or other tools)
- provide external libraries with different monitoring routines (based on timers, hardware counters, etc)
- add support for saving multiple hardware counters and connect to the machine learning library to predict best optimization cases (as presented in the CGO'07 paper "Rapidly Selecting Good Compiler Optimizations using Performance Counters")
- compile different function versions for architectures with different ISA and provide explicit data transfer mechanisms (on reconfigurable architectures or GPUs for example)
- learn and cluster different run-time program behavior (for different program inputs, program phases, environments, reconfigurable architectures)
- select only several most appropriate versions in the final self-tuning adaptive binary
- reconfigure architectures at run-time based on program phases and optimization cases (performance vs power consumption)
- select only several most appropriate versions for the final production self-tuning adaptive binary
- provide feedback from Virtual Machines about switching context to be able to select the appropriate function for a new hardware (we briefly brainstormed this with the AMD guys during HiPEAC ACACES summer school'08)
We hope to combine this work with another initiative to provide Function Specific Optimization, with Interactive Compilation Interface and with the MILEPOST GCC. We would also like to extend this work to enable practical run-time adaptability for statically compiled programs on heterogeneous systems (such as CPU-GPU architectures, IBM CELL, SUN T1 and Intel TeraScale).