Re: [PATCH] Fix PR18754: add early loop pass, 2nd try

On Fri, 21 Jan 2005, Daniel Berlin wrote:

> On Fri, 21 Jan 2005, Richard Guenther wrote:
> > On Thu, 20 Jan 2005, Zdenek Dvorak wrote:
> >
> >>>> The right fix seems to be to add the second SRA pass in the middle of loop
> >>>> optimizations (just immediately after cunroll).  You would also need to
> >>>> schedule constant propagation pass there (which should just work)
> >>>> and preferably also cfg_cleanup (the variation from tcb branch that
> >>>> preserves loop structures).
> >>>
> >>> Yes, I tried this - actually just adding SRA and redphi after cunroll,
> >>> but this caused verify failures about not the right ssa form or so.  So
> >>> I guessed SRA may be not ready to preserve invariants the loop
> >>> optimizers need.
> >>
> >> you probably need to rerun the loop closed ssa form creation afterwards
> >> (rewrite_into_loop_closed_ssa).
> >
> > Ok, tried this again (see proof of concept patch below).  With
> > -O2 -funroll-loops this solves the original testcase of PR18754,
> > but fails on the C++ testcase verifying the ssa form:
> >
> > scalar_loops.cpp: In function 'void foo(const Array<2>&, const
> > Array<2>&)':
> > scalar_loops.cpp:32: internal compiler error: tree check: expected
> > ssa_name, have var_decl in verify_ssa, at tree-ssa.c:690
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See <URL:> for instructions.
> >
> > Any ideas what is going wrong?  This doesn't change, if I remove
> > the rename_ssa_copies() call.
> >
> > Thanks,
> > Richard.
> >
> rename_ssa_copies coalesces ssa variables, not renaming.
> Since you don't have a valid ssa form at that point, it can't possibly
> work right :)
> Call rewrite_into_ssa (false);

Ah ok, yes, that fixes the ICE.  Now I still do not get ivopts
to optimize the sra'ed stuff, and sra doesn't catch all stuff it
could.  It seems complete unrolling leaves us with lots of
optimization opportunities -- this is also why with early loop
unrolling adding a dominator pass after it exposes the optimization
opportunities only.  Scheduling a ccp pass before sra helps somewhat,
putting dom there segfaults the compiler (probably it alters the cfg,
and the loop optimizer is not happy about this).

One problem is/may be we have stuff like

  D.1999_171 =[i_204];
  D.2000_174 =[i_204];
  D.2001_176 = D.1999_171 + D.2000_174;[i_204] = D.2001_176;
  i_178 = i_204 + 1;
  ivtmp.27_163 = ivtmp.27_196 - 1;
  if (0) goto <L60>; else goto <L16>;

  goto <bb 3> (<L13>);

Invalid sum of incoming frequencies 10000, should be 5000

after cunroll - i.e. the BBs are not merged and we still have
the loop exit test there.  So, until we get a cfg_cleanup that
preserves loop information, scheduling SRA after cunroll and before
ivopts doesn't help very much.

Zdenek - I remember you posted a patch for loop cfg_cleanup
sometime ago, is this suitable for 4.0?  I also remember some
other ivopts patches that may be suitable now, as we're back
to regular stage 3.


Richard Guenther <richard dot guenther at uni-tuebingen dot de>

