This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
- From: Brad Lucier <lucier at math dot purdue dot edu>
- To: jh at suse dot cz (Jan Hubicka)
- Cc: lucier at math dot purdue dot edu (Brad Lucier), jh at suse dot cz (Jan Hubicka), dje at watson dot ibm dot com (David Edelsohn), gcc at gcc dot gnu dot org, mark at codesourcery dot com, feeley at iro dot umontreal dot ca
- Date: Fri, 29 Mar 2002 11:44:29 -0500 (EST)
- Subject: Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
>
> > [RE: crossjumping]
> >
> > > I will try to check whether I can squeze out some more cycles or
> > > find way how to limit this.
> >
> > My code uses a lot of computed goto's, with many labels. Is crossjumping
> > possibly a win in this case? Can it ignore these jumps?
> >
> > Brad
> Hi, here is patch I made as a test. It simply disables crossjumping
> if there is moer than 100 outgoing edges. Unfortunately I can't benchmark
> your testcase as my machine runs out of space before getting there. Can you
> check if this solves your problem? If so, I will prepare more polished
> version of this patch.
First of all, there were no regressions in the test suite on
sparcv9-sun-solaris2.8 with your patch.
I made a slightly smaller test case that requires only 1200 MB to compile on
sparcv9. (Perhaps it would require less on regular sparc with 32-bit pointers.)
I did not build a profiled version of cc1 yet.
The results are not so good with your patch.
banach-725% ~/programs/gcc/gcc-3.1/objdir-sparcv9/gcc/stage2/cc1 -fpreprocessed denoise3.i -mptr64 -mstack-bias -mno-v8plus -dumpbase denoise3.c -m64 -mcpu=ultrasparc -mtune=ultrasparc -O1 -Wall -W -Wno-unused -version -fPIC -fschedule-insns2 -fno-math-errno -fno-strict-aliasing -o denoise3.s
GNU CPP version 3.1 20020328 (prerelease) (cpplib) (sparc ELF)
GNU C version 3.1 20020328 (prerelease) (sparcv9-sun-solaris2.8)
compiled by GNU C version 3.1 20020328 (prerelease).
options passed: -fpreprocessed -mptr64 -mstack-bias -mno-v8plus -m64
-mcpu=ultrasparc -mtune=ultrasparc -O1 -Wall -W -Wno-unused -fPIC
-fschedule-insns2 -fno-math-errno -fno-strict-aliasing
options enabled: -fdefer-pop -fomit-frame-pointer -fthread-jumps
-fpeephole -ffunction-cse -fkeep-static-consts -freg-struct-return
-fdelayed-branch -fgcse-lm -fgcse-sm -fschedule-insns2 -fsched-interblock
-fsched-spec -fbranch-count-reg -fPIC -fcprop-registers -fcommon
-fgnu-linker -fargument-alias -fmerge-constants -fident
-fguess-branch-probability -ftrapping-math -mepilogue -mptr64 -m64
-mstack-bias -mcpu=ultrasparc -mtune=ultrasparc
___H__20_denoise3 {GC 72981k -> 25818k} {GC 33692k -> 25034k} {GC 49841k -> 28604k} {GC 42905k -> 28468k} {GC 42063k -> 33617k} {GC 55871k -> 36690k} ___init_proc ____20_denoise3
Execution times (seconds)
garbage collection : 4.31 ( 0%) usr 0.02 ( 0%) sys 6.00 ( 0%) wall
cfg construction : 66.65 ( 2%) usr 22.01 (51%) sys 89.00 ( 2%) wall
cfg cleanup :3156.77 (87%) usr 0.04 ( 0%) sys3193.00 (86%) wall
life analysis : 79.31 ( 2%) usr 0.00 ( 0%) sys 79.00 ( 2%) wall
life info update : 0.80 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
preprocessing : 0.41 ( 0%) usr 1.81 ( 4%) sys 3.00 ( 0%) wall
lexical analysis : 0.45 ( 0%) usr 3.63 ( 8%) sys 5.00 ( 0%) wall
parser : 4.22 ( 0%) usr 2.46 ( 6%) sys 5.00 ( 0%) wall
expand : 2.00 ( 0%) usr 0.26 ( 1%) sys 3.00 ( 0%) wall
varconst : 0.65 ( 0%) usr 0.03 ( 0%) sys 0.00 ( 0%) wall
integration : 0.93 ( 0%) usr 0.04 ( 0%) sys 1.00 ( 0%) wall
jump : 0.69 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall
CSE : 4.66 ( 0%) usr 0.00 ( 0%) sys 5.00 ( 0%) wall
loop analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 138.86 ( 4%) usr 13.01 (30%) sys 152.00 ( 4%) wall
combiner : 8.42 ( 0%) usr 0.00 ( 0%) sys 9.00 ( 0%) wall
if-conversion : 11.72 ( 0%) usr 0.01 ( 0%) sys 11.00 ( 0%) wall
local alloc : 2.63 ( 0%) usr 0.00 ( 0%) sys 2.00 ( 0%) wall
global alloc : 25.46 ( 1%) usr 0.00 ( 0%) sys 26.00 ( 1%) wall
reload CSE regs : 106.46 ( 3%) usr 0.00 ( 0%) sys 106.00 ( 3%) wall
flow 2 : 4.41 ( 0%) usr 0.00 ( 0%) sys 5.00 ( 0%) wall
if-conversion 2 : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rename registers : 8.73 ( 0%) usr 0.00 ( 0%) sys 9.00 ( 0%) wall
scheduling 2 : 4.88 ( 0%) usr 0.01 ( 0%) sys 5.00 ( 0%) wall
delay branch sched : 4.05 ( 0%) usr 0.00 ( 0%) sys 3.00 ( 0%) wall
shorten branches : 0.50 ( 0%) usr 0.00 ( 0%) sys 2.00 ( 0%) wall
final : 1.17 ( 0%) usr 0.03 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 2.50 ( 0%) usr 0.02 ( 0%) sys 3.00 ( 0%) wall
TOTAL :3641.89 43.42 3722.00
The file denoise3.i is at
http://www.math.purdue.edu/~lucier/GNATS/GNATS-4/denoise3.i.gz
Do you want me to build a profiled version of cc1 with your patch?
Brad