fix opt/8634

Zdenek Dvorak rakdver@atrey.karlin.mff.cuni.cz
Wed Apr 9 21:59:00 GMT 2003


Hello,

> > sequential pass over memory should not cost us that much, IMHO.
> 
> I agree it shouldn't.  Those memory manufacturers have been quite lazy!
> They should be able to keep up with cpu manufacturers.
> 
> But unless you're planning on solving this inequality (and becoming
> very rich in the process), we have to live with what we've been given.

you misunderstood what I meant. What I wanted to say is that if rtl was
not that wasteful on memory and if we had it organized in a sane way
the costs would not be that great.

I have made several experiments that did not confirm the idea as well
as I hoped :-( Here are the results, just for case someone was interested:

I wrote the pass that measures how we jump in memory when we pass
through each insn (using for_each_rtx, ignoring rtxes that are always
shared, immediatelly before first cse pass). The result seemed quite
scary:

try_combine
 Sequentiality report at POS1:
  69 skips in range -2097152 .. -4194303
  27 skips in range -1048576 .. -2097151
  436 skips in range -524288 .. -1048575
  599 skips in range -262144 .. -524287
  307 skips in range -131072 .. -262143
  133 skips in range -65536 .. -131071
  75 skips in range -32768 .. -65535
  296 skips in range -16384 .. -32767
  763 skips in range -8192 .. -16383
  1455 skips in range -4096 .. -8191
  1017 skips in range -2048 .. -4095
  193 skips in range -1024 .. -2047
  131 skips in range -512 .. -1023
  42 skips in range -256 .. -511
  2 skips in range -128 .. -255
  4 skips in range -64 .. -127
  108 skips in range -32 .. -63
  235 skips in range -16 .. -31
  3165 skips in range -8 .. -15
  1 skips in range 0 .. 0
  96 skips in range 8 .. 15
  32 skips in range 16 .. 31
  12 skips in range 32 .. 63
  1 skips in range 128 .. 255
  5 skips in range 256 .. 511
  79 skips in range 512 .. 1023
  172 skips in range 1024 .. 2047
  974 skips in range 2048 .. 4095
  1239 skips in range 4096 .. 8191
  618 skips in range 8192 .. 16383
  263 skips in range 16384 .. 32767
  250 skips in range 32768 .. 65535
  118 skips in range 65536 .. 131071
  334 skips in range 131072 .. 262143
  563 skips in range 262144 .. 524287
  421 skips in range 524288 .. 1048575
  36 skips in range 1048576 .. 2097151
  69 skips in range 2097152 .. 4194303
 Totally 14340 references to 14253 rtxes from 4568 insns

I have decided to test whether the performance would improve if I
ordered the rtxes exactly in order of the pass (to do it, I have written
trivial 'garbage collector' that never frees any memory and just assigns
it sequentially and copied the whole insn chain immediately before cse);
the results are attached, just some of the performance numbers
(sum of three compilations of combine.c):

                           not reordered          reordered
 CSE                   :   6.72                   6.69
 global CSE            :   5.96                   5.86
 combiner              :   4.21                   4.17
 local alloc           :   2.58                   2.51
 global alloc          :   5.78                   5.98
 reload CSE regs       :   2.37                   2.31
 flow 2                :   0.69                   0.59
 scheduling 2          :   3.13                   3.01

On several passes there is about 2-3% speedup. Some of the other (almost
all I haven't listed here :-) slowed instead (not sure why; perhaps
moving the rtl further from the rest of structures had a negative impact
on caches).  Total speed of compilation went down a bit (but it could
also be just because copying the whole insn chain is not cheap).

Conclusion: it would probably make a sense to try to keep rtxes
belonging to the single insn close together.  It would also make a sense
to have less memory bloated code representation (but this is of course
something that cannot be changed now, and also it does not seem
quite desirable).

Zdenek



More information about the Gcc-patches mailing list