[Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Feb 26 10:19:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looking at the original description - note that copying cannot be optimized
away, the accesses are to global variables (well, unless you build with -flto
or -fwhole-program which will privatize the stmts).
But of course the "correctness" test is optimized away very early. So the
testcase should get a __asm__ volatile ("" : : "memory"); inbetween the
copying and the correctness verification.
Currently vectorization is entered with the IL
<bb 2>:
_8 = MEM[(unsigned int *)&in + 4B];
MEM[(unsigned int *)&out] = _8;
_14 = MEM[(unsigned int *)&in + 8B];
MEM[(unsigned int *)&out + 4B] = _14;
_20 = MEM[(unsigned int *)&in + 12B];
MEM[(unsigned int *)&out + 8B] = _20;
_26 = MEM[(unsigned int *)&in + 16B];
MEM[(unsigned int *)&out + 12B] = _26;
return 0;
(see - no check anymore)
We generate (with -mcpu=e6500 -m64 -maltivec -mabi=altivec, just to pick one
example)
<bb 2>:
vect__2.12_11 = __builtin_altivec_mask_for_load (&MEM[(unsigned int *)&in +
4B]);
vectp.14_13 = &MEM[(unsigned int *)&in + 4B] & -16B;
vect__2.15_14 = MEM[(unsigned int *)vectp.14_13];
vectp.14_16 = &MEM[(void *)&in + 16B] & -16B;
vect__2.16_17 = MEM[(unsigned int *)vectp.14_16];
vect__2.17_18 = REALIGN_LOAD <vect__2.15_14, vect__2.16_17, vect__2.12_11>;
MEM[(unsigned int *)&out] = vect__2.17_18;
return 0;
and
(insn 16 15 17 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 8)
(mem:DI (plus:DI (reg:DI 170)
(const_int 8 [0x8])) [1 MEM[(unsigned int *)&MEM[(unsigned int
*)&in + 4B] & -16B]+8 S8 A32])) t.c:14 -1
(nil))
(insn 17 16 18 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 0)
(mem:DI (reg:DI 170) [1 MEM[(unsigned int *)&MEM[(unsigned int *)&in +
4B] & -16B]+0 S8 A32])) t.c:14 -1
(nil))
(insn 21 20 22 (set (reg:V4SI 176)
(mem:V4SI (reg:DI 174) [1 MEM[(unsigned int *)&MEM[(void *)&in + 16B] &
-16B]+0 S16 A128])) t.c:14 -1
(nil))
so for some reason we expand the first aligned load using two DI loads.
Investigating.
I have a fix which ends up producing
.L.main1:
addis 9,2,.LANCHOR0@toc@ha
li 3,0
addi 9,9,.LANCHOR0@toc@l
addi 10,9,4
addi 9,9,16
neg 8,10
lvx 0,0,9
lvsr 13,0,8
addis 9,2,.LANCHOR1@toc@ha
lvx 1,0,10
addi 9,9,.LANCHOR1@toc@l
vperm 0,1,0,13
stvx 0,0,9
blr
not sure if that is the same as with 4.8 though (don't have a cross ready
to verify - but the RTL looks good).
More information about the Gcc-bugs
mailing list