This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[GSoC] [plugins] [ici] function cloning + fine-grain optimizations/program instrumentation


Hi Yuanjie, Liang, et al,

This email is about further GSoC'09 developments for plugins, generic function cloning, fine-grain optimizations
and program instrumentation this summer. Considering that the basic infrastructure is now available I would like
to agree on further developments based on the feedback I got during last 3 weeks so that we could extend 
the projects quickly. Though this email primarily concerns Yuanjie and Liang, I am sending this email to all the colleagues involved in the project or who has been interested at some point as well as GCC and cTuning mailing lists just to make everyone aware of the developments. This is a long email so if you are not interested in these projects, please skip it ... 

1) Originally we thought to use stable GCC 4.4.0 with plugin/ICI support for GSoC (already prepared), 
however considering that GCC 4.5 will have plugin support and extended function cloning capabilities, 
we should eventually move all the developments to the trunk. Zbigniew mentioned that he will synchronize 
ICI with  the current trunk fully within 2 weeks, so we can start working on GCC 4.4.0 (with plugins and ICI)
until then (plugins shouldn't change much but some gluing with new GCC will be required) and then sync 
with GCC 4.5 + synced ICI.

2) We need to prepare a plugin that uses XML library (libxml2 for example - http://xmlsoft.org) 
and records basic information about compilation flow. I suggest that we record
the following info per function for now (we can use filename gcc_compilation_flow.<function name>.xml for example):
* GCC version
* Plugin version which has been used to record info
* File name
* function name
** Within inter-procedural stage, we can call a function name #IP#
and besides IP passes also provide some global info such as which optimization flags/parameters has been used
* function start line (source) and end line and other currently available ICI features from http://ctuning.org/wiki/index.php/CTools:ICI:List_of_features
* function specific optimizations or code generation flags (if applicable - I think
Mike Meissner's patch that enables function-specific flags has been included in GCC 4.5)
* passes
** available fine-grain optimization within passes 

Except fine-grain optimizations, all the information should be already available 
and there are 2 ICI plugin (test1, test2) that show how to get this info...

We should record this info per function to avoid large files for large projects since often 
we may want to control only a few functions. This can help with memory and cpu utilization
when using libxml ...

We should be able to control which functions to process using either a command line argument
with a list of functions or an environment variable (which we can later convert into command
line argument). If it's empty, all the functions are processed.

3) When we want to perform function cloning or use fine-grain optimization/instrumentation,
we can use the same XML files created during the record stage (or prepare them manually/automatically
using external tools) and add additional fields.
We will need to perform function cloning using a new IP pass (as described in the GCC Summit
presentation by Honza Hubicka). 

We can provide info about which functions to clone in XML file for a IP cloning pass,
i.e. something like:
<pass>generic_cloning
 <external_libraries_for_adaptation>libadapt, other libraries if needed such as hardware counters monitoring
(if needed)
<function>foo
  <clones>2
  <clone_name_extension>_clone
  <adaptation_function>gcc_adapt (this function will be called before the clone and will select which
clone to use based on either machine description or monitoring of hardware counters or dataset features
to enable online dynamic optimization for statically compiled programs, etc)
 <function>boo
  <clones>3   
  <clone_name_extension>_clone
  <adaptation_function>

Basically, when we create clones, we need to make the following substitution for a code:

foo{
/* before cloning */
...
}

foo{
/* after cloning */

switch (gcc_adapt(function_number)) {
case 1: foo_clone1(..); break;
case 2: foo_clone2(..); break;
default:
/* original code */
...
}

Basically, when the generic_cloning pass is invoked, it will be communicating with
a plugin asking for all the necessary information to clone 1 function. The plugin
will send an "End" instruction when all the functions are processed so that compilation
could continue ...

We need to decide how to number functions (so that the selection is fast) 
and how to aggregate this info is we compile projects with multiple functions.
Also, we can have a mode when we skip the function number in case we adapt
for different architecture and compile clones with different -msse2, -msse3 flags ...

After cloning is done, all the cloned functions should appear in recorded XML file.
We can then optimize those clones using different flags or different passes
or changing fine-grain optimizations. 

We can use OProfile or gprof to monitor performance however we may also need instrumentation
capabilities to add calls to external time/hardware counters monitoring routines
before and after a call to a function. We can provide this info within XML as well,
i.e.
<function>foo
<add_function_call_before_func>_timer1
<add_function_call_after_func>_timer2

As for fine-grain optimizations, I suggest that we start from unrolling, vectorization
and blocking since those optimizations do not have good heuristics yet so we can try
to help tuning them automatically. Again, after an associated pass we should record
info about a loop(nest) where this optimization happened and which parameter has been
used - we will start adding more info about features preceding the optimization decision:
<function>foo
 <pass>unroll
  <loop>1
  <unroll_factor>4
  <loop>2
  <unroll_factor>1
<function>boo
 <pass>graphite (?)
  <loop>1
  <blocking_factor>64
...
etc.

I don't know if it's clear or not and I will be happy to elaborate more so comments are welcome!
Yuanjie and Liang, we can discuss further developments tomorrow during a conf-call ...

We need to prepare the first prototypes reasonably quickly so that we could see 
if there are potential problems and how to solve them ... I will be helping with 
testing and evaluation...

Cheers,
Grigori



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]