This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
Hello,
> > > after cunroll - i.e. the BBs are not merged and we still have
> > > the loop exit test there. So, until we get a cfg_cleanup that
> > > preserves loop information, scheduling SRA after cunroll and before
> > > ivopts doesn't help very much.
> > >
> > > Zdenek - I remember you posted a patch for loop cfg_cleanup
> > > sometime ago, is this suitable for 4.0?
> >
> > I don't know. It is definitely pretty important for cunroll, but
> > probably does not fit in stage 3 criteria. There is a version of the
> > patch commited to tcb branch, so at least for your experiments you may
> > use it; cfgcleanup + ccp after cunroll should basically get you as far
> > as it goes, dom should not help that much (it would be necessary
> > to play a bit with the jump threading inside it to make it preserve
> > loop structures).
>
> I played with patch #1 from
> http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01381.html
> and this definitely helps sanitizing the CFG after cunroll. Alone
> it doesn't help PR18754, but scheduling CCP and SRA
> after cunroll improves the code somewhat. For the inner loop I
> get then
>
> .L19:
> movl -16(%ebp), %eax # j,
> leal 1(%ebx), %ecx #, res$data$0
> movl -28(%ebp), %edx #, tmp73
> imull %eax, %edx #, tmp73
> movl -28(%ebp), %eax #, tmp76
> imull %edi, %eax # tmp86, tmp76
> addl %ecx, %edx # res$data$0, tmp74
> leal (%ebx,%eax), %eax #, tmp77
> movl -32(%ebp), %ebx #,
> fldl (%ebx,%eax,8) #
> faddl (%ebx,%edx,8) #
> movl %ecx, %ebx # res$data$0, i
> fmul %st(1), %st #,
> fstpl (%esi) #* ivtmp.46
> addl $8, %esi #, ivtmp.46
> cmpl %ecx, -20(%ebp) # i, ei
> jg .L19 #,
>
> which is still far from the perfect solution an early cunroll
> pass with dom after it just before the existing sra pass yields:
>
> .L19:
> fldl (%edx)
> incl %ecx
> addl $8, %edx
> faddl 8(%ebx)
> addl $8, %ebx
> fmul %st(1), %st
> fstpl (%eax)
> addl $8, %eax
> cmpl %ecx, %esi
> jg .L19
>
> which is btw. exactly the same code as for the C-ish style version.
>
> Comparing the tree dumps before ivopts of both versions (SRA inside
> loop + loop_cfg_cleanup and early cunroll before SRA), there is a _lot_ of
> optimizing missing, so it's no wonder ivopts cannot do better in the first
> case.
>
> So I really wonder, if SRA inside loop is ever going to work as
> good as an extra early loop pass completely unrolling loops.
> As a conclusion I would stick with the extra early loop pass for
> 4.0 - everything else is way too invasive now.
>
> Any other ideas?
I think we should just let things as they are for 4.0 and concentrate on
getting it right in 4.1. Early cunroll pass enabled by a separate flag
is definitely a bad idea. Early cunroll pass enabled unconditionally
is a hack -- if the code is too messy after cunroll, the right fix is to
put there cleanup pass(es) anyway, for the benefit of other loop
optimizations.
Zdenek