This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR18754: add early loop pass, 2nd try


Hello,

> > > after cunroll - i.e. the BBs are not merged and we still have
> > > the loop exit test there.  So, until we get a cfg_cleanup that
> > > preserves loop information, scheduling SRA after cunroll and before
> > > ivopts doesn't help very much.
> > >
> > > Zdenek - I remember you posted a patch for loop cfg_cleanup
> > > sometime ago, is this suitable for 4.0?
> >
> > I don't know.  It is definitely pretty important for cunroll, but
> > probably does not fit in stage 3 criteria.  There is a version of the
> > patch commited to tcb branch, so at least for your experiments you may
> > use it; cfgcleanup + ccp after cunroll should basically get you as far
> > as it goes, dom should not help that much (it would be necessary
> > to play a bit with the jump threading inside it to make it preserve
> > loop structures).
> 
> I played with patch #1 from
> http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01381.html
> and this definitely helps sanitizing the CFG after cunroll.  Alone
> it doesn't help PR18754, but scheduling CCP and SRA
> after cunroll improves the code somewhat.  For the inner loop I
> get then
> 
> .L19:
>         movl    -16(%ebp), %eax # j,
>         leal    1(%ebx), %ecx   #, res$data$0
>         movl    -28(%ebp), %edx #, tmp73
>         imull   %eax, %edx      #, tmp73
>         movl    -28(%ebp), %eax #, tmp76
>         imull   %edi, %eax      # tmp86, tmp76
>         addl    %ecx, %edx      # res$data$0, tmp74
>         leal    (%ebx,%eax), %eax       #, tmp77
>         movl    -32(%ebp), %ebx #,
>         fldl    (%ebx,%eax,8)   #
>         faddl   (%ebx,%edx,8)   #
>         movl    %ecx, %ebx      # res$data$0, i
>         fmul    %st(1), %st     #,
>         fstpl   (%esi)  #* ivtmp.46
>         addl    $8, %esi        #, ivtmp.46
>         cmpl    %ecx, -20(%ebp) # i, ei
>         jg      .L19    #,
> 
> which is still far from the perfect solution an early cunroll
> pass with dom after it just before the existing sra pass yields:
> 
> .L19:
>         fldl    (%edx)
>         incl    %ecx
>         addl    $8, %edx
>         faddl   8(%ebx)
>         addl    $8, %ebx
>         fmul    %st(1), %st
>         fstpl   (%eax)
>         addl    $8, %eax
>         cmpl    %ecx, %esi
>         jg      .L19
> 
> which is btw. exactly the same code as for the C-ish style version.
> 
> Comparing the tree dumps before ivopts of both versions (SRA inside
> loop + loop_cfg_cleanup and early cunroll before SRA), there is a _lot_ of
> optimizing missing, so it's no wonder ivopts cannot do better in the first
> case.
> 
> So I really wonder, if SRA inside loop is ever going to work as
> good as an extra early loop pass completely unrolling loops.
> As a conclusion I would stick with the extra early loop pass for
> 4.0 - everything else is way too invasive now.
> 
> Any other ideas?

I think we should just let things as they are for 4.0 and concentrate on
getting it right in 4.1.  Early cunroll pass enabled by a separate flag
is definitely a bad idea.  Early cunroll pass enabled unconditionally
is a hack -- if the code is too messy after cunroll, the right fix is to
put there cleanup pass(es) anyway, for the benefit of other loop
optimizations. 

Zdenek


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]