fix opt/8634
Zdenek Dvorak
rakdver@atrey.karlin.mff.cuni.cz
Wed Apr 9 21:59:00 GMT 2003
Hello,
> > sequential pass over memory should not cost us that much, IMHO.
>
> I agree it shouldn't. Those memory manufacturers have been quite lazy!
> They should be able to keep up with cpu manufacturers.
>
> But unless you're planning on solving this inequality (and becoming
> very rich in the process), we have to live with what we've been given.
you misunderstood what I meant. What I wanted to say is that if rtl was
not that wasteful on memory and if we had it organized in a sane way
the costs would not be that great.
I have made several experiments that did not confirm the idea as well
as I hoped :-( Here are the results, just for case someone was interested:
I wrote the pass that measures how we jump in memory when we pass
through each insn (using for_each_rtx, ignoring rtxes that are always
shared, immediatelly before first cse pass). The result seemed quite
scary:
try_combine
Sequentiality report at POS1:
69 skips in range -2097152 .. -4194303
27 skips in range -1048576 .. -2097151
436 skips in range -524288 .. -1048575
599 skips in range -262144 .. -524287
307 skips in range -131072 .. -262143
133 skips in range -65536 .. -131071
75 skips in range -32768 .. -65535
296 skips in range -16384 .. -32767
763 skips in range -8192 .. -16383
1455 skips in range -4096 .. -8191
1017 skips in range -2048 .. -4095
193 skips in range -1024 .. -2047
131 skips in range -512 .. -1023
42 skips in range -256 .. -511
2 skips in range -128 .. -255
4 skips in range -64 .. -127
108 skips in range -32 .. -63
235 skips in range -16 .. -31
3165 skips in range -8 .. -15
1 skips in range 0 .. 0
96 skips in range 8 .. 15
32 skips in range 16 .. 31
12 skips in range 32 .. 63
1 skips in range 128 .. 255
5 skips in range 256 .. 511
79 skips in range 512 .. 1023
172 skips in range 1024 .. 2047
974 skips in range 2048 .. 4095
1239 skips in range 4096 .. 8191
618 skips in range 8192 .. 16383
263 skips in range 16384 .. 32767
250 skips in range 32768 .. 65535
118 skips in range 65536 .. 131071
334 skips in range 131072 .. 262143
563 skips in range 262144 .. 524287
421 skips in range 524288 .. 1048575
36 skips in range 1048576 .. 2097151
69 skips in range 2097152 .. 4194303
Totally 14340 references to 14253 rtxes from 4568 insns
I have decided to test whether the performance would improve if I
ordered the rtxes exactly in order of the pass (to do it, I have written
trivial 'garbage collector' that never frees any memory and just assigns
it sequentially and copied the whole insn chain immediately before cse);
the results are attached, just some of the performance numbers
(sum of three compilations of combine.c):
not reordered reordered
CSE : 6.72 6.69
global CSE : 5.96 5.86
combiner : 4.21 4.17
local alloc : 2.58 2.51
global alloc : 5.78 5.98
reload CSE regs : 2.37 2.31
flow 2 : 0.69 0.59
scheduling 2 : 3.13 3.01
On several passes there is about 2-3% speedup. Some of the other (almost
all I haven't listed here :-) slowed instead (not sure why; perhaps
moving the rtl further from the rest of structures had a negative impact
on caches). Total speed of compilation went down a bit (but it could
also be just because copying the whole insn chain is not cheap).
Conclusion: it would probably make a sense to try to keep rtxes
belonging to the single insn close together. It would also make a sense
to have less memory bloated code representation (but this is of course
something that cannot be changed now, and also it does not seem
quite desirable).
Zdenek
More information about the Gcc-patches
mailing list