This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/71805] New: incorrect code for test pr45752.c with -mcpu=power9
- From: "acsawdey at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 07 Jul 2016 21:49:17 +0000
- Subject: [Bug c/71805] New: incorrect code for test pr45752.c with -mcpu=power9
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71805
Bug ID: 71805
Summary: incorrect code for test pr45752.c with -mcpu=power9
Product: gcc
Version: 6.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: acsawdey at gcc dot gnu.org
CC: bergner at gcc dot gnu.org, meissner at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
Target Milestone: ---
Target: powerpc64le-linux
Created attachment 38859
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38859&action=edit
objdump of generated binary plus my annotations which are abstracted in the
note above
testsuite/gcc.dg/vect/pr45752.c is producing some code where it seems like a
register value needed is being overwritten
Compile flags:
/home/sawdey/src/gcc/gcc-6-branch/build/gcc/xgcc
-B/home/sawdey/src/gcc/gcc-6-branch/build/gcc/
/home/sawdey/src/gcc/gcc-6-branch/gcc/gcc/testsuite/gcc.dg/vect/pr45752.c
-mcpu=power9 -Wl,-rpath=/tmp/lib64 -fno-diagnostics-show-caret
-fdiagnostics-color=never -flto -ffat-lto-objects -maltivec -mpower9-vector
-ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details
--param tree-reassoc-width=1 -lm -o ./pr45752.exe
The compiler is gcc-6-branch 238072 plus bergner's p9 VMX ICE patch and
kelvin's vpermr fix.
The 4th group of 4 results is incorrect:
(gdb) p check_results
$24 = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399,
22848, 8174, 307964, 146829, 22009, 32668, 11594, 447564, 202404, 31619}
(gdb) p output
$25 = {3208, 1334, 28764, 35679, 2789, 13028, 4754, 168364, 91254, 12399,
22848, 8174, 310424, 178137, 26529, 31036, 11594, 447564, 202404, 31619}
This is my extraction of the dataflow for the incorrect vector:
10000788: 09 00 e9 f5 lxv vs47,0(r9)
<< set vs47/v15 from load
100007b8: 09 00 87 f6 lxv vs52,0(r7)
<< set vs52/v20 from load
10000898: 09 01 81 f4 lxv vs36,256(r1)
<< set vs36/v4 from load
100008f8: 99 01 61 f7 lxv vs59,400(r1)
<< set vs59 from load
10000900: 89 01 01 f4 lxv vs32,384(r1)
<< set vs32 from load
10000918: 01 00 e7 f7 lxv vs31,0(r7)
<< set vs31 from load
1000094c: 01 00 49 f4 lxv vs2,0(r9)
<< set vs2 from load
10000950: 01 00 a7 f5 lxv vs13,0(r7)
<< set vs13 from load
10000958: eb 03 fb 11 vperm v15,v27,v0,v15
<< set v15/vs47 from v27, v0, v15
10000988: 8c 22 81 11 vspltw v12,v4,1
<< set v12/vs44 from v4/vs36
10000994: 01 00 29 f4 lxv vs1,0(r9)
<< set vs1 from load
100009a0: 96 64 ac f2 xxlor vs21,vs44,vs44
<< set vs21 from vs44/v12
100009a4: 8c 22 83 11 vspltw v12,v4,3
<< set v12/vs44 from v4/vs36
100009c0: 96 64 8c f0 xxlor vs4,vs44,vs44
<< set vs4 from vs44/v12
100009cc: 91 ac b5 f1 xxlor vs45,vs21,vs21
<< set vs45/v13 from vs21
100009d0: 91 fc df f1 xxlor vs46,vs31,vs31
<< set vs46/v14 from vs31
100009f0: 96 7c af f0 xxlor vs5,vs47,vs47
<< set vs5 from vs47/v15
100009fc: 89 70 ed 10 vmuluwm v7,v13,v14
<< set v7/vs39 from v13, v14
10000a08: 91 14 a2 f1 xxlor vs45,vs2,vs2
<< set vs45/v13 from vs2
10000a28: 91 24 84 f1 xxlor vs44,vs4,vs4
<< set v12/vs44 from vs4
10000a2c: 89 68 8c 11 vmuluwm v12,v12,v13
<< set v12/vs44 from v12, v13
10000a3c: 96 64 8c f0 xxlor vs4,vs44,vs44
<< set vs4 from vs44/v12
10000a40: f9 00 81 f5 lxv vs44,240(r1)
<< set vs44/v12 from load
10000a44: 8c 62 c0 11 vspltw v14,v12,0
<< set v14/vs46 from v12/vs44
10000aa0: d4 68 5a f1 xxperm vs10,vs58,vs13
<< set vs10 from vs58, vs13
10000aa4: 8c 22 40 13 vspltw v26,v4,0
<< set v26/vs58 from v4/vs36
10000acc: 01 00 c7 f7 lxv vs30,0(r7)
<< set vs30 from load
10000b08: 91 0c 81 f1 xxlor vs44,vs1,vs1
<< set vs44/v12 from vs1
10000b0c: 89 60 ce 11 vmuluwm v14,v14,v12
<< set v14/vs46 from v14, v12
10000b20: 8c 22 a2 11 vspltw v13,v4,2
<< set v13/vs45 from v4/vs36
10000b24: 96 6c 8d f3 xxlor vs28,vs45,vs45
<< set vs28 from vs45/v13
10000b40: 91 2c 85 f1 xxlor vs44,vs5,vs5
<< set vs44/v12 from vs5
10000b44: 89 a0 8c 12 vmuluwm v20,v12,v20
<< set v20/vs52 from v12 and v20
10000b5c: 01 00 a9 f5 lxv vs13,0(r9)
<< set vs13 from load
10000b94: 89 68 ac 11 vmuluwm v13,v12,v13
<< v13/vs45 set here to be written over?
10000b98: 91 e4 9c f1 xxlor vs44,vs28,vs28
<< set vs44/v12 from vs28
10000ba0: 91 fc bf f1 xxlor vs45,vs31,vs31
<< set vs45/v13 from vs31
10000ba4: 89 68 8c 12 vmuluwm v20,v12,v13
<< set v20 from v12 and v13
10000bcc: 91 24 24 f3 xxlor vs57,vs4,vs4
<< set vs57/v25 from vs4
10000be0: 80 c8 e7 10 vadduwm v7,v7,v25
<< set v7 from v7 and v25
10000c00: 80 70 e7 10 vadduwm v7,v7,v14
<< set v7 from v7 and v14
10000c10: 91 6c ed f3 xxlor vs63,vs13,vs13
<< set vs63 from vs13
10000c28: 89 f8 5a 13 vmuluwm v26,v26,v31
<< set v26 from v26 and v31
10000c44: 80 a0 07 11 vadduwm v8,v7,v20
<< set v8/vs40 from v7 and v28
10000c68: 80 d0 08 11 vadduwm v8,v8,v26
<< set v8/vs40 from v8 and v26
The punchline is at 10000b94/10000ba0 which both set v13/vs45 and I don't think
that is what was intended.