This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/67441] New: Scheduler unable to disambiguate memory references in unrolled loop


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67441

            Bug ID: 67441
           Summary: Scheduler unable to disambiguate memory references in
                    unrolled loop
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pthaugen at gcc dot gnu.org
                CC: bergner at gcc dot gnu.org, dje at gcc dot gnu.org,
                    wschmidt at gcc dot gnu.org
  Target Milestone: ---
              Host: powerpc64-unknown-linux-gnu
            Target: powerpc64-unknown-linux-gnu
             Build: powerpc64-unknown-linux-gnu

The following shows an example where the scheduler is unable to disambiguate
memory references inside the unrolled loop, which prevents any motion of the
loads above the (non-overlapping) preceding stores.

pthaugen@genoa:~/temp/unroll-alias$ cat junk.c
#define SIZE 1024

double x[SIZE] __attribute__ ((aligned (16)));

void do_one(void)
{
  unsigned long i;

  for (i = 0; i < SIZE; i++)
    x[i] = x[i] + 1.0;
}
pthaugen@genoa:~/temp/unroll-alias$ ~/install/gcc/trunk/bin/gcc -O3
-funroll-loops -S junk.c -mcpu=power8

Following is generated, which shows the loop unrolled, but no movement of the
loads/adds, so we basically have back to back copies of the loop body.

.L2:
        lxvd2x 12,0,9
        addi 4,9,16
        addi 11,9,32
        addi 5,9,48
        addi 6,9,64
        addi 7,9,80
        addi 8,9,96
        addi 12,9,112
        xvadddp 1,12,0
        stxvd2x 1,0,9
        addi 9,9,128
        lxvd2x 2,0,4
        xvadddp 3,2,0
        stxvd2x 3,0,4
        lxvd2x 4,0,11
        xvadddp 5,4,0
        stxvd2x 5,0,11
        lxvd2x 6,0,5
        xvadddp 7,6,0
        stxvd2x 7,0,5
        lxvd2x 8,0,6
        xvadddp 9,8,0
        stxvd2x 9,0,6
        lxvd2x 10,0,7
        xvadddp 11,10,0
        stxvd2x 11,0,7
        lxvd2x 13,0,8
        xvadddp 12,13,0
        stxvd2x 12,0,8
        lxvd2x 1,0,12
        xvadddp 2,1,0
        stxvd2x 2,0,12
        bdnz .L2


An example store/load sequence looks like the following at sched1 timeframe,
where r193 coming in was set to r170+64.

(insn 81 80 82 3 (set (mem:V2DF (reg:DI 193 [ ivtmp.14 ]) [1 MEM[base: _7,
offset: 0B]+0 S16 A128])
        (reg:V2DF 196 [ vect__5.6 ])) junk.c:12 886 {*vsx_movv2df}
     (expr_list:REG_DEAD (reg:V2DF 196 [ vect__5.6 ])
        (expr_list:REG_DEAD (reg:DI 193 [ ivtmp.14 ])
            (nil))))
(insn 82 81 90 3 (set (reg:DI 197 [ ivtmp.14 ])
        (plus:DI (reg:DI 170 [ ivtmp.14 ])
            (const_int 80 [0x50]))) 81 {*adddi3}
     (nil))
(insn 90 82 91 3 (set (reg:V2DF 199 [ MEM[base: _7, offset: 0B] ])
        (mem:V2DF (reg:DI 197 [ ivtmp.14 ]) [1 MEM[base: _7, offset: 0B]+0 S16
A128])) junk.c:12 886 {*vsx_movv2df}
     (nil))

The str/ld use different base regs, and the fact that they're both based off
r170+displ is lost when we're just looking at the two mem refs during the
sched-deps code. So it falls back to the tree aliasing oracle where they both
have the same MEM expr with offset 0 so are not disambiguated.

Not sure if unroller should be creating new tree MEM expr with appropriate
offsets so the mem's can be seen as not overlapping or if sched-deps code needs
to be enhanced to try and incorporate the base reg increment so that the rtl
base/displ is clearly seen and can be disambiguated that way.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]