[PATCH] Implementing Swing Modulo Scheduling in GCC

Thu Apr 22 18:39:00 GMT 2004

Mark Mitchell wrote:

> Mostafa Hagog wrote:
>
>> We addressed the comments below.  The troubled code (unrolling and
>> renaming) was removed and replaced by a direct dependence computation
>> using df.c.  We also added support for loops with unknown bounds.
>>
>> Here is the revised patch relative to mainline.   Passed regression
>> and bootstrap on powerpc-apple-darwin7.2.0 target.
>>
> I'm not going to approve the patch as it stands.
> However, I think it looks very good; it's certainly tidy and has 
> better documentation than many patches.  Furthermore, the algorithm 
> looks like a good choice.
>
> Before check-in the patch should be tested on three architectures.  
> I'd suggest IA32 GNU/Linux and IA64 GNU/Linux in addition to OS X.  
> Also, are you able to post SPEC 2000 numbers with and without the 
> patch on these platforms?  That would help to demonstrate that the 
> patch is doing useful stuff on code that a lot of people believe 
> should benefit from these kinds of improvements.  Finally, you should 
> post compile-time performance with and without the patch.  It's 
> reasonable for the compile-time performance to get a little worse if 
> the SPEC nubmers are getting better, but the impact should hopefully 
> be minimal.

Sorry, Mark.  I've just finished to review the new version of the patch 
and sent the comments before reading your email.

  Software pipelining is a quite specific optimization.  I remeber that 
a professor from NCSU specilized in insn scheduling told us to stay away 
from implementing SP (it is to complicated and expensive optimization).  
I believe it will not give an improvement for SPEC2000.  Although It 
could improve code for small benchmarks like sorting and matrix 
multiplication.   So I'd expect a small benchmark demonstrating the 
improvement.  Mostafa and Ayal gave such example.  Software pipelining 
is also very expensive optimization (with the compilation time point of 
view).  I'd not recommend it to use by default even for -O3.

IMHO the current implementation is mainly oriented to RISC 
architectures.  I'd not expect benefits to use it for x86.  But you are 
right it should be checked for regression for ia32 too.

This implementation is a good start.  There are many opportunities to 
impove it and I hope people will start to work on the improvements when 
it is on the mainline.

Vlad