This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/67441] New: Scheduler unable to disambiguate memory references in unrolled loop
- From: "pthaugen at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 03 Sep 2015 02:39:04 +0000
- Subject: [Bug rtl-optimization/67441] New: Scheduler unable to disambiguate memory references in unrolled loop
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67441
Bug ID: 67441
Summary: Scheduler unable to disambiguate memory references in
unrolled loop
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pthaugen at gcc dot gnu.org
CC: bergner at gcc dot gnu.org, dje at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
Target Milestone: ---
Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
Build: powerpc64-unknown-linux-gnu
The following shows an example where the scheduler is unable to disambiguate
memory references inside the unrolled loop, which prevents any motion of the
loads above the (non-overlapping) preceding stores.
pthaugen@genoa:~/temp/unroll-alias$ cat junk.c
#define SIZE 1024
double x[SIZE] __attribute__ ((aligned (16)));
void do_one(void)
{
unsigned long i;
for (i = 0; i < SIZE; i++)
x[i] = x[i] + 1.0;
}
pthaugen@genoa:~/temp/unroll-alias$ ~/install/gcc/trunk/bin/gcc -O3
-funroll-loops -S junk.c -mcpu=power8
Following is generated, which shows the loop unrolled, but no movement of the
loads/adds, so we basically have back to back copies of the loop body.
.L2:
lxvd2x 12,0,9
addi 4,9,16
addi 11,9,32
addi 5,9,48
addi 6,9,64
addi 7,9,80
addi 8,9,96
addi 12,9,112
xvadddp 1,12,0
stxvd2x 1,0,9
addi 9,9,128
lxvd2x 2,0,4
xvadddp 3,2,0
stxvd2x 3,0,4
lxvd2x 4,0,11
xvadddp 5,4,0
stxvd2x 5,0,11
lxvd2x 6,0,5
xvadddp 7,6,0
stxvd2x 7,0,5
lxvd2x 8,0,6
xvadddp 9,8,0
stxvd2x 9,0,6
lxvd2x 10,0,7
xvadddp 11,10,0
stxvd2x 11,0,7
lxvd2x 13,0,8
xvadddp 12,13,0
stxvd2x 12,0,8
lxvd2x 1,0,12
xvadddp 2,1,0
stxvd2x 2,0,12
bdnz .L2
An example store/load sequence looks like the following at sched1 timeframe,
where r193 coming in was set to r170+64.
(insn 81 80 82 3 (set (mem:V2DF (reg:DI 193 [ ivtmp.14 ]) [1 MEM[base: _7,
offset: 0B]+0 S16 A128])
(reg:V2DF 196 [ vect__5.6 ])) junk.c:12 886 {*vsx_movv2df}
(expr_list:REG_DEAD (reg:V2DF 196 [ vect__5.6 ])
(expr_list:REG_DEAD (reg:DI 193 [ ivtmp.14 ])
(nil))))
(insn 82 81 90 3 (set (reg:DI 197 [ ivtmp.14 ])
(plus:DI (reg:DI 170 [ ivtmp.14 ])
(const_int 80 [0x50]))) 81 {*adddi3}
(nil))
(insn 90 82 91 3 (set (reg:V2DF 199 [ MEM[base: _7, offset: 0B] ])
(mem:V2DF (reg:DI 197 [ ivtmp.14 ]) [1 MEM[base: _7, offset: 0B]+0 S16
A128])) junk.c:12 886 {*vsx_movv2df}
(nil))
The str/ld use different base regs, and the fact that they're both based off
r170+displ is lost when we're just looking at the two mem refs during the
sched-deps code. So it falls back to the tree aliasing oracle where they both
have the same MEM expr with offset 0 so are not disambiguated.
Not sure if unroller should be creating new tree MEM expr with appropriate
offsets so the mem's can be seen as not overlapping or if sched-deps code needs
to be enhanced to try and incorporate the base reg increment so that the rtl
base/displ is clearly seen and can be disambiguated that way.