This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- Cc: Daniel Berlin <dberlin at dberlin dot org>,Giovanni Bajo <giovannibajo at libero dot it>, <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 21 Jan 2005 16:49:19 +0100 (CET)
- Subject: Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
On Fri, 21 Jan 2005, Zdenek Dvorak wrote:
> > after cunroll - i.e. the BBs are not merged and we still have
> > the loop exit test there. So, until we get a cfg_cleanup that
> > preserves loop information, scheduling SRA after cunroll and before
> > ivopts doesn't help very much.
> > Zdenek - I remember you posted a patch for loop cfg_cleanup
> > sometime ago, is this suitable for 4.0?
> I don't know. It is definitely pretty important for cunroll, but
> probably does not fit in stage 3 criteria. There is a version of the
> patch commited to tcb branch, so at least for your experiments you may
> use it; cfgcleanup + ccp after cunroll should basically get you as far
> as it goes, dom should not help that much (it would be necessary
> to play a bit with the jump threading inside it to make it preserve
> loop structures).
I played with patch #1 from
and this definitely helps sanitizing the CFG after cunroll. Alone
it doesn't help PR18754, but scheduling CCP and SRA
after cunroll improves the code somewhat. For the inner loop I
movl -16(%ebp), %eax # j,
leal 1(%ebx), %ecx #, res$data$0
movl -28(%ebp), %edx #, tmp73
imull %eax, %edx #, tmp73
movl -28(%ebp), %eax #, tmp76
imull %edi, %eax # tmp86, tmp76
addl %ecx, %edx # res$data$0, tmp74
leal (%ebx,%eax), %eax #, tmp77
movl -32(%ebp), %ebx #,
fldl (%ebx,%eax,8) #
faddl (%ebx,%edx,8) #
movl %ecx, %ebx # res$data$0, i
fmul %st(1), %st #,
fstpl (%esi) #* ivtmp.46
addl $8, %esi #, ivtmp.46
cmpl %ecx, -20(%ebp) # i, ei
jg .L19 #,
which is still far from the perfect solution an early cunroll
pass with dom after it just before the existing sra pass yields:
addl $8, %edx
addl $8, %ebx
fmul %st(1), %st
addl $8, %eax
cmpl %ecx, %esi
which is btw. exactly the same code as for the C-ish style version.
Comparing the tree dumps before ivopts of both versions (SRA inside
loop + loop_cfg_cleanup and early cunroll before SRA), there is a _lot_ of
optimizing missing, so it's no wonder ivopts cannot do better in the first
So I really wonder, if SRA inside loop is ever going to work as
good as an extra early loop pass completely unrolling loops.
As a conclusion I would stick with the extra early loop pass for
4.0 - everything else is way too invasive now.
Any other ideas?
Richard Guenther <richard dot guenther at uni-tuebingen dot de>