This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: SMS in gcc4.0





Hi Mark,

First of all I would like this discussion to be on the GCC mailing list; so
I am CCing the GCC mailing list (I hope this is OK with all the others).

"Davis, Mark" <mark.davis@intel.com> wrote on 31/03/2005 00:23:02:
> Mostafa & Gerald,
>
...
> It was mentioned that you folks had recently
> added SMS to gcc4.0, and I found the SMS paper from last year's gcc
> summit, and the description of SMS capabilities in gcc on the 4.0
> Features web site.  So the obvious approach is to use SMS for Itanium as
> well as Power5 and ....
>
> 1) Is SMS in gcc currently turned on for anything other than Power5?  I
> built a gcc4.0 for Itanium, and tried compiling the summation example
> from your paper (and some unrolled summation examples) using
>    -O3 -fmodulo-sched

We haven't yet put efforts to tune SMS for any specific architecture
(including Power5); SMS is implemented as general as the paper (mentioned
in http://gcc.gnu.org/news/sms.html) describes.

>
> but didn't see any difference in the .s file from not using
> -fmodulo-sched.  Are there other switches to turn on or dumps to look
> at?

I would suggest to start looking at SMS dumps to see what is doing there;
you can do so by adding the -dm flag to your compilation.  If you want you
can send me those dumps and I will look into them.

>
>
> I'm afraid I also was the origin of some of the "not very useful"
> comments about SMS.  From my way of thinking, if SMS doesn't have alias
> information or array dependence analysis, then SMS can't pipeline loops
> storing into array elements; therefore it is not very useful as a
> pipeliner, even if the swing modulo scheduling part is excellent.
> 2) Did I miss something here?

This is true; that's why we need accurate alias info in RTL level and this
is one of the efforts that one should concentrate on in improving SMS.

>
> I do not know about gcc internals (which is why I'm "project-managing",
> not "implementing"), so it was interesting and disturbing to hear what
> you and Vlad had to say about the different internal representations
> relative to when the SMS phase runs:
>    a) it seems to be too early to see the machine code
>    b) it's too late to have alias info
>
> 3) Do you agree with this assessment?

Its not black or white.  We need accurate alias info at the RTL level to be
able to software pipeline (SMS) the majority of the loops in the real world
programs- currently the RTL alias info is not accurate enough for those
loops.  Having the alias info make us capable of eliminating memory to
memory dependancies and thus make us know that we can interleave different
iterations of the loop.  The alias info is usually based on high level
representation of the code, the lower you are the more information you
lose.  One of the things that we would do is maintain this information
while we go down in the trees and RTL representation each pass that does
some transformation on the code will require additional effort to maintain
alias information which complicates it - that's why we want SMS to be  as
early as possible.  The other side of the coin is the modeling of the
machine resources (SMS is trying to solve a scheduling problem).  In SMS we
use DFA for resource modeling in which we follow each one of the
instruction resource usage and try to get the optimal schedule by moving
instructions among the different iteration trying to avoid resource
conflicts.  The problem in doing this early is that later passes can change
the resource usage of instructions when doing transformations on the code
(splitting instructions for example) and thus make the schedule not
optimal.  A good example for a way to handle this is the disabling of the
second scheduling pass for SMSed loops to prevent it from screwing the
schedules generated by SMS.  We can do the same for other passes and have a
cost model to decide if it is beneficial or not to perform the optimization
inside the SMSed loops.  Other problem that results from doing SMS before
register allocation is increasing the register pressure when SMS is
aggressive. IMO, this problem should be addressed later by using register
pressure estimation inside SMS.

> 4) Do you have any suggestions about using SMS for an in-order
> microarchitecture like Itanium which is more sensitive to the exact
> schedule than OOO microarchitectures like Power5?

Actually I would say that an in-order machine would benefit more from SMS
than an OOO machine, because theoretically OOO machines do the job of SMS
in hardware in many cases. The problem is not for IA64 being in-order, but
the fact that IA64 and other in-order machines are highly dependent on the
scheduling and among them SMS.  My suggestion is that we must invest in
lowering alias info to RTL and feed this information to the DDG used by SMS
which is implemented in ddg.c.

> 5) In the Intel compiler for Itanium, we carry the alias information
> from the high-level IL down to the machine-code level IL, and pipeline
> on the machine-code IL, before register allocation.

This is where SMS is currently positioned; it means that our problem is not
where SMS is performed but the alias information not getting there.  This
is exactly what we were thinking all the time; the IC example reinforces
this thought.

>
> thanks,
> Mark Davis
> Intel Compiler Lab
> (formerly with DEC compiler team)
> Nashua, NH
>

Mostafa.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]