This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

ia64 performance regression

What is the purpose of the following code in loop.c (loop_regs_scan)?

  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
      regs->array[i].may_not_optimize = 1;
      regs->array[i].set_in_loop = 1;

It seems to be regarding all hardware registers as being set within loops.

I'm asking the question because an ia64 global variable reference
looks like this:

(insn 52 48 54 (parallel[
            (set (reg/f:DI 360)
                (symbol_ref:DI ("p")))
            (clobber (reg:DI 361))
            (use (reg:DI 1 r1))
        ] ) -1 (nil)
    (expr_list:REG_EQUAL (symbol_ref:DI ("p"))

r1 is the global data pointer.  The USE of r1 is causing a dependency
which is made non-invariant when loop_regs_scan sets "set_in_loop"
for r1.  By not hoisting global data references, we're leaving a lot of
ia64 performance on the table.

The problem was exposed by

Before that patch, dependencies introduced by USEs inside of PARALLELs were
simply ignored.

Here's a little C code that causes the problem to happen.
Compile with -O2 -S on ia64 and take a look at the .s file.

extern long *buf1, *buf2;
extern int n;

long *p, *q;

void doit ()
    int i;
    int m = n;
    long c = 12;

    p = buf1;
    q = buf2;

    for (i = 0; i < m; i++) {
        *(q++) = *(p++) + c;

Any insights would be appreciated.



Steve Christiansen <>
IBM Linux Technology Center
503-578-4177  IBM T/L: 775-4177

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]