This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] third liveness pass


Hi,
Several of optimization passes seems to need badly availability of
liveness information.  From what is currently in the mainline tree, the
ifconverison and jump threading at least become immediately stronger
when liveness is available, similary for GCSE that can do code hoisting
of instruction clobbering hard registers and so on.  Currently we do so
after flow1 pass that is bit too late, as CSE can't cleanup after then
transformation.

Also the early dead code removal is important step as observed on the
Stephanov benchamrk.

On the cfg-branch I am having for a while an third liveness pass just
before GCSE.  Because of recent discussions about the compiler
perofmrance I see this decision as somewhat contraversal, so I would
like to discuss it.  I've implemented simple patch to add third liveness
to mainline (attached) and asked Andreas to do the benchmarking.  The
results are interesting. As expected, the bootstrap is about 1% slower
and there is just small increase in performance (and decrease of size)
of C benchmarks.

But interestingly the only C++ benchamrk, eon shows different figures.
The savings are about 2.2% in code size and 1.6% performance (*).  I am not
sure how representative it is for C++, but it seeems to suggest that the
abstraction penalties can be significatly lowered, since the stephanov
results are similar as well.  The overall savings for SPECs is 0.2%
in size and similar ammount of perfomrance I guess, but SPECs generally
do have very low abstraction penalties, as many loops are handoptimized.

Would this be considered as strong enought purpose to have the pass?

As mentioned, I believe this will pay back more with extra effort, once
GCSE is made stronger (for i386 this has neutral perofmrance effect, but
I guess it is register allocation problem) and other passes use it. I
believe that for instance CSE can be easilly hacked to use the notes to
reduce register pressure instead of current local heuristics.

I also have the double-test converison pass that should be somewhat
stronger when run before CSE than before combine I do currently.

Similary I think the liveness costs can be made much lower, since
currently we don't compute bitmaps of local properties, instead re-scan
every time that can be expensive especially when dead store removal has
been added.  Perhaps the dead code removal is better done using DU/UD
chains and curent ssa-dce code converted to these, but I am not sure how
popular step this can be.

I am attaching the patch and results for reference.
Honza

(*) For some purpose the mainline eon has failed for Andreas, but the
machine is same as one used by periodic tester and other results are
consistent, so I've just filled in the gap from the official results.

Attachment: le
Description: Text document

Attachment: live
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]