This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Some statement counts for gcc
- From: Brad Lucier <lucier at math dot purdue dot edu>
- To: zack at codesourcery dot com (Zack Weinberg)
- Cc: lucier at math dot purdue dot edu (Brad Lucier), gcc at gcc dot gnu dot org, jh at suse dot cz (Jan Hubicka)
- Date: Mon, 26 Aug 2002 08:18:15 -0500 (EST)
- Subject: Re: Some statement counts for gcc
>
> On Sun, Aug 25, 2002 at 06:04:11PM -0500, Brad Lucier wrote:
>
> > branch prediction : 165.11 (33%) usr 0.07 ( 1%) sys 165.50 (33%) wall
> ...
> > A surprising (to me) number of lines in real.c are executed; I might look
> > there to see what's going on.
>
> The branch predictor uses emulated floating point numbers internally.
> Jan Hubicka has explained why this is currently necessary -- can't
> find the message at the moment though. However, real.c is indeed
> quite slow; I suspect that accounts entirely for the amount of time
> spent in this patch
The problem is that gcc's fp code generator on x87 is broken enough that
you can get different results for the same expression, hence the use
of the simulator which does not use extended-precision arithmetic
by default. I'm not really sure that the simulator is unreasonably slow.
> If I remember correctly this code has a very complicated flow graph,
> and branch prediction may not help much; perhaps the right thing is
> to detect code like this and disable that optimization.
This has been the response to several of my recent observations about
gcc's algorithms, etc. I'd prefer that if there are problems they be
fixed rather than papered over by a -fbrad's_code_don't_optimize flag.
At any rate, here's some data for compiling the larger file:
banach-139% gcc/cc1 -fnew-ra -m64 -O1 -fschedule-insns2 -fno-strict-aliasing -fno-math-errno -mcpu=ultrasparc -mtune=ultrasparc _io.i
___H__20___io {GC 75305k -> 25172k} {GC 42902k -> 24245k} {GC 32413k -> 24597k} {GC 35635k -> 27127k} {GC 45565k -> 27034k} {GC 50659k -> 26850k} {GC 37567k -> 30050k} {GC 52691k -> 31806k} ___init_proc ____20___io
Execution times (seconds)
garbage collection : 10.79 ( 0%) usr 0.24 ( 0%) sys 21.50 ( 0%) wall
cfg construction : 29.83 ( 0%) usr 4.03 ( 1%) sys 34.50 ( 0%) wall
cfg cleanup : 99.46 ( 0%) usr 0.03 ( 0%) sys 99.00 ( 0%) wall
trivially dead code : 5.48 ( 0%) usr 0.00 ( 0%) sys 5.50 ( 0%) wall
life analysis : 435.03 ( 2%) usr 0.08 ( 0%) sys 442.50 ( 2%) wall
life info update : 58.31 ( 0%) usr 0.00 ( 0%) sys 58.50 ( 0%) wall
preprocessing : 2.93 ( 0%) usr 2.57 ( 1%) sys 5.00 ( 0%) wall
lexical analysis : 1.64 ( 0%) usr 5.18 ( 1%) sys 10.00 ( 0%) wall
parser : 12.39 ( 0%) usr 2.87 ( 1%) sys 13.50 ( 0%) wall
expand : 4.24 ( 0%) usr 0.17 ( 0%) sys 5.00 ( 0%) wall
varconst : 1.49 ( 0%) usr 0.01 ( 0%) sys 1.50 ( 0%) wall
integration : 1.04 ( 0%) usr 0.02 ( 0%) sys 1.00 ( 0%) wall
jump : 64.02 ( 0%) usr 0.00 ( 0%) sys 63.50 ( 0%) wall
CSE : 12.37 ( 0%) usr 0.00 ( 0%) sys 12.50 ( 0%) wall
loop analysis : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
branch prediction :1211.11 ( 5%) usr 1.25 ( 0%) sys1214.50 ( 5%) wall
flow analysis : 3.76 ( 0%) usr 0.00 ( 0%) sys 3.50 ( 0%) wall
combiner : 18.08 ( 0%) usr 0.00 ( 0%) sys 18.50 ( 0%) wall
if-conversion : 5.82 ( 0%) usr 0.00 ( 0%) sys 6.00 ( 0%) wall
local alloc :20603.26 (91%) usr 399.06 (96%) sys21695.50 (91%) wall
global alloc : 16.60 ( 0%) usr 0.05 ( 0%) sys 22.00 ( 0%) wall
reload CSE regs : 62.53 ( 0%) usr 0.06 ( 0%) sys 69.00 ( 0%) wall
flow 2 : 1.11 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall
if-conversion 2 : 5.83 ( 0%) usr 0.01 ( 0%) sys 6.50 ( 0%) wall
rename registers : 10.91 ( 0%) usr 0.00 ( 0%) sys 11.00 ( 0%) wall
scheduling 2 : 9.92 ( 0%) usr 0.02 ( 0%) sys 10.00 ( 0%) wall
delay branch sched : 14.78 ( 0%) usr 0.00 ( 0%) sys 14.50 ( 0%) wall
shorten branches : 0.85 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%) wall
final : 4.50 ( 0%) usr 0.05 ( 0%) sys 5.00 ( 0%) wall
rest of compilation : 8.71 ( 0%) usr 0.02 ( 0%) sys 9.50 ( 0%) wall
TOTAL :22716.96 415.73 23860.50
Although the GC statistics indicate not much memory use, this code took up
to 3.8 GB of swap when running.
This is for the file at
http://www.math.purdue.edu/~lucier/_io.i.gz
Here we see an excessive amount of time spent in the new register allocator;
the most-executed lines in the ra-* files are at the end of this message.
The complete .c.gcov files for this run are at
http://www.math.purdue.edu/~lucier/gcovs_io.tgz
The ra-*.c.gcov files are at
http://www.math.purdue.edu/~lucier/ra-gcovs_io.tgz
and the executed lines sorted in decreasing order of execution are at
http://www.math.purdue.edu/~lucier/ra-sorted-lines.gz
Brad
ra-build.c.gcov: 79915683: 1625: if (web1 == web2 || TEST_BIT (igraph, index))
ra-build.c.gcov: 79915683: 1623: unsigned int index = igraph_index (id1, id2);
ra-build.c.gcov: 79915683: 1622: unsigned int id1 = web1->id, id2 = web2->id;
ra-build.c.gcov: 79915683: 1621:{
ra-colorize.c.gcov: 79134101: 741: for (wl = v->conflict_list; wl; wl = wl->next)
ra-colorize.c.gcov: 79111468: 764: for (i = 0; i < nregs; i++)
ra-colorize.c.gcov: 79111383: 802: if (pweb->type != SELECT && pweb->type != COALESCED)
ra-colorize.c.gcov: 79111383: 799: if (u->type != PRECOLORED)
ra-colorize.c.gcov: 79111383: 769: record_conflict (web, pweb);
ra-colorize.c.gcov: 79111383: 768: if (wl->sub == NULL)
ra-colorize.c.gcov: 79111383: 766: if (u->type == PRECOLORED)
ra-colorize.c.gcov: 79111383: 755: if (u->type == PRECOLORED)
ra-colorize.c.gcov: 79111383: 754: int nregs = 1 + v->add_hardregs;
ra-colorize.c.gcov: 79111383: 753: struct web *web = u;
ra-colorize.c.gcov: 79111383: 751: if (1)
ra-colorize.c.gcov: 79111383: 743: struct web *pweb = wl->t;
ra-colorize.c.gcov: 79111298: 800: break;
ra-build.c.gcov: 69833156: 1636: if ((web1->type == PRECOLORED
ra-build.c.gcov: 69833156: 1633: return;
ra-build.c.gcov: 69833156: 1631: if ((web1->regno < FIRST_PSEUDO_REGISTER && fixed_regs[web1->regno])
ra-build.c.gcov: 69833156: 1627: if (id1 == id2)
ra-build.c.gcov: 69833156: 1626: return;
ra-build.c.gcov: 69827720: 1652: if (web1->type != PRECOLORED && web2->type != PRECOLORED
ra-build.c.gcov: 69827615: 1665: add_conflict_edge (web2, web1);
ra-build.c.gcov: 69827615: 1664: add_conflict_edge (web1, web2);
ra-build.c.gcov: 69827615: 1663: SET_BIT (igraph, index);
ra-rewrite.c.gcov: 66002982: 1141: for (d = (pass) ? WEBS(SPILLED) : WEBS(COALESCED); d; d = d->next)
ra-rewrite.c.gcov: 65990988: 1145: if (aweb->type != SPILLED)
ra-rewrite.c.gcov: 65990988: 1144: struct web *aweb = alias (web);
ra-rewrite.c.gcov: 65990988: 1143: struct web *web = DLIST_WEB (d);ra-rewrite.c.gcov: 65972997: 1146: continue;
ra-colorize.c.gcov: 58624158: 1759: for (nn = web2->conflict_list; nn && !wide_p; nn = nn->next)
ra-colorize.c.gcov: 58583640: 1760: if (alias (nn->t)->add_hardregs)
ra-colorize.c.gcov: 55463179: 431: if (web->num_conflicts < NUM_REGS (web) && before >= NUM_REGS (web))
ra-colorize.c.gcov: 55463179: 430: web->num_conflicts -= dec;
ra-colorize.c.gcov: 55463179: 429: int before = web->num_conflicts;
ra-colorize.c.gcov: 55463179: 428:{
ra-colorize.c.gcov: 54992227: 803: decrement_degree (pweb, 1 + v->add_hardregs);
ra-colorize.c.gcov: 46572957: 1306: for (wl = web->conflict_list; wl; wl = wl->next)
ra-colorize.c.gcov: 46540347: 1313: if (ptarget->type != COLORED && ptarget->type != PRECOLORED
ra-colorize.c.gcov: 46540347: 1312: w = sl ? sl->t : wl->t;
ra-colorize.c.gcov: 46540347: 1311: IOR_HARD_REG_SET (bias, ptarget->bias_colors);
ra-colorize.c.gcov: 46540347: 1310: struct sub_conflict *sl = wl->sub;
ra-colorize.c.gcov: 46540347: 1309: struct web *ptarget = alias (wl->t);
ra-colorize.c.gcov: 46540347: 1308: struct web *w;
ra-colorize.c.gcov: 46512519: 480: for (wl = web->conflict_list; wl; wl = wl->next)
ra-colorize.c.gcov: 46479912: 483: if (pweb->type != SELECT && pweb->type != COALESCED)
ra-colorize.c.gcov: 46479912: 482: struct web *pweb = wl->t;
ra-rewrite.c.gcov: 19823503: 814: for (; size--;)
ra-build.c.gcov: 16465236: 546: return r1;
ra-build.c.gcov: 16465236: 488: if (r1 != r2)
ra-build.c.gcov: 16465236: 487:{
ra-colorize.c.gcov: 11909293: 1959: if (web1->spill_cost > web2->spill_cost)