This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [wwwdocs] more updates to gcc-3.4/changes.html

From: Jan Hubicka <jh at suse dot cz>
To: Vladimir Makarov <vmakarov at redhat dot com>
Cc: Jan Hubicka <jh at suse dot cz>, gcc-patches at gcc dot gnu dot org
Date: Wed, 7 Jan 2004 01:00:10 +0100
Subject: Re: [wwwdocs] more updates to gcc-3.4/changes.html
References: <20040106175229.GG30323@kam.mff.cuni.cz> <3FFB431A.3F21F6DB@redhat.com>

> Jan Hubicka wrote:
> > 
> > !     <li>Web construction pass enabled via <code>-fweb</code> (and implied by
> > !       <code>-O3</code>) improve quality of register allocation, CSE
> > !       and some other optimization passes by avoid re-use of pseudo registers
> > !       with non-overlapping live ranges.  The pass almost always improve
> > !       code quality but does make debugging difficult and thus it is not
> > !       enabled by default by <code>-O2</code>
> > !       <p>The pass is especially effective as cleanup after code duplication
> > !       passes, such as loop unroller or the tracer.</p></li>
> 
>   Unfortunately for me, I've written the same optimization several
> months ago
> and just waited when mainline is at the stage 1 to commit the
> patch.  My major difference is in that I placed it right before
> register allocation.  You were quicker to commit it.  Was it written in
> 2001/2002?  I found it in title of web.c.  Although I see it was
> committed in November 2003.

Yes, I had it around for a while.
My original motivation has been to clean up after new loop unroller.
Old loop unroller used to identify pseudos used locally in the loop body
and generated different pseudoes in each copy.  This improved later CSE
and register allocation.
this is why it is done more early. You can see more explanation at:
http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00501.html
> 
>   I had mixed experience with the optimization.  E.g. perlbmk for P4
> has much worse result with this optimization.  But this is mainly of
> global register allocation drawbacks (50% of the benchmark is spent in
> one function regexp matching -- on my opinion the benchmark should
> have named as regexp.  A global pseudo register living through
> function gets hard register without the optimization and it is better
> than some pseudo-registers created from the global pseudo-register get
> memory).

Hmm, I see, for some reason I forgot to include this switch in my GCC
summit paper.  I also noticed some regressions (for me it was crafty
benchamrk) that were caused by fact that regmove no longer were able to
elliminate reg-reg move on 2 address i386. 
I also wrote register coalescing pass that undone this, but the overall
benefits of this was not high enought to motivate me to try to push this
into mainline.  Still hope new RA will eventually take this.

Some numbers are in the original patch email at:
http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01537.html
> 
>   But in overall this optimization is good for x86 because more values
> lives in registers (it also results in that code size for x86 with this
> optimization is usually smaller).  This option is a good candidate for
> Scott Ladd's benchmarks (huffbench was sped up 20% for P4).

It is currently enabled by default at -O3, so I think he is considering
it.
> 
>   The optimization creates much more parallelism for the 1st insn
> scheduler because it removes many anti-dependencies.  So it should be
> definitely used for Itanium processors especially for scientific
> programs (e.g. SPECFP2000 sixtrack became 17% faster).  On the other
> hand, it might give worse results for processor with small or medium
> register files because more freedom in rearranging insns results in
> more probability of unwilling register spills which could be avoided.
> Even some SPECFP2000 tests have worse results because an excessive
> register pressure after the 1st insn scheduling.

Yes, perhaps I can mention this in the page.
I was doing some tests with trace/superblock scheduling in sched1 pass
with kind of interesting results (noticeable speedups on SPARC for some
benchmarks but slowdowns on other with overall mixed outcome).  Didn't
had time to experiment with this since then.

Honza
> 
> Vlad

References:
- [wwwdocs] more updates to gcc-3.4/changes.html
  - From: Jan Hubicka
- Re: [wwwdocs] more updates to gcc-3.4/changes.html
  - From: Vladimir Makarov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]