This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] reload1.c: Very minor speedup.


On Friday, February 6, 2004, at 02:34 PM, Paolo Carlini wrote:
Attached is a patch to micro-optimize the reset of can_eliminate in
reload1.c.

- if (ep->from_rtx == x && ep->can_eliminate)
+ if (ep->from_rtx == x)
ep->can_eliminate = 0;

if you have got two spare minutes, could you possibly explain a bit?


I mean, it's because the cost of a test (&& ep->can_eliminate) is
comparable to that of an assignment (ep->can_eliminate = 0), never
much smaller? Is it true on every architecture?

Smaller? He did say speedup, not smaller!


Modern machines are fascinating. You really want to grab a high level tool like Shark (Mac developer tool) and run it on your favorite code and then take a look at the results. Once you train up for a few days, you'll discover just what you've been missing. Neat results like, out of these 133,000 instructions in this one file, 4 of them, no more, no less account for 90% of the time.

The change assumes a load store unit that isn't bandwidth limited and a conditional branch that will be slow, mispredicted. Seems like it might be the right choice, though, of course, I'd almost want to fire it up and watch it, but I think this is so far down in the noise that I'll pass up the opportunity.

[ pause ]

Ok, so I tried this, and found:

0.873 for V1 and 0.876 for V2:

struct S {
  int pad[3];
  int can_eliminate;
  int pad2[4];
  int rtx;
  int tortx;
} er[1000];

main() {
  int i, j;
  struct S *s = er;

  for (i= 0; i<100000; ++i) {
    for (s = er; s<&er[1000]; ++s) {
#ifdef V1
      if (s->rtx == 12 && s->can_eliminate)
#else
      if (s->rtx == 12)
#endif
        s->can_eliminate = 0;
    }
  }
}

and using a much larger er that might get us out of cache more:

V1: 0.448 V2: 0.459

So, the proposed patch looks like on my machine, it is slower. Kazu, did you measure a speedup? On what machine? One the whole, changing code for a near zero win, is probably not very interesting, otot, V2 is 12 bytes smaller, which tilts towards V2, plus it compiles faster! :-)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]