This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR18754: add early loop pass, 2nd try


On Fri, 21 Jan 2005, Zdenek Dvorak wrote:

> Hello,
>
> > after cunroll - i.e. the BBs are not merged and we still have
> > the loop exit test there.  So, until we get a cfg_cleanup that
> > preserves loop information, scheduling SRA after cunroll and before
> > ivopts doesn't help very much.
> >
> > Zdenek - I remember you posted a patch for loop cfg_cleanup
> > sometime ago, is this suitable for 4.0?
>
> I don't know.  It is definitely pretty important for cunroll, but
> probably does not fit in stage 3 criteria.  There is a version of the
> patch commited to tcb branch, so at least for your experiments you may
> use it; cfgcleanup + ccp after cunroll should basically get you as far
> as it goes, dom should not help that much (it would be necessary
> to play a bit with the jump threading inside it to make it preserve
> loop structures).

I played with patch #1 from
http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01381.html
and this definitely helps sanitizing the CFG after cunroll.  Alone
it doesn't help PR18754, but scheduling CCP and SRA
after cunroll improves the code somewhat.  For the inner loop I
get then

.L19:
        movl    -16(%ebp), %eax # j,
        leal    1(%ebx), %ecx   #, res$data$0
        movl    -28(%ebp), %edx #, tmp73
        imull   %eax, %edx      #, tmp73
        movl    -28(%ebp), %eax #, tmp76
        imull   %edi, %eax      # tmp86, tmp76
        addl    %ecx, %edx      # res$data$0, tmp74
        leal    (%ebx,%eax), %eax       #, tmp77
        movl    -32(%ebp), %ebx #,
        fldl    (%ebx,%eax,8)   #
        faddl   (%ebx,%edx,8)   #
        movl    %ecx, %ebx      # res$data$0, i
        fmul    %st(1), %st     #,
        fstpl   (%esi)  #* ivtmp.46
        addl    $8, %esi        #, ivtmp.46
        cmpl    %ecx, -20(%ebp) # i, ei
        jg      .L19    #,

which is still far from the perfect solution an early cunroll
pass with dom after it just before the existing sra pass yields:

.L19:
        fldl    (%edx)
        incl    %ecx
        addl    $8, %edx
        faddl   8(%ebx)
        addl    $8, %ebx
        fmul    %st(1), %st
        fstpl   (%eax)
        addl    $8, %eax
        cmpl    %ecx, %esi
        jg      .L19

which is btw. exactly the same code as for the C-ish style version.

Comparing the tree dumps before ivopts of both versions (SRA inside
loop + loop_cfg_cleanup and early cunroll before SRA), there is a _lot_ of
optimizing missing, so it's no wonder ivopts cannot do better in the first
case.

So I really wonder, if SRA inside loop is ever going to work as
good as an extra early loop pass completely unrolling loops.
As a conclusion I would stick with the extra early loop pass for
4.0 - everything else is way too invasive now.

Any other ideas?

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]