Summary: | [miscompilation] vunpcklpd accessing xmm16-22 targeting KNL | ||
---|---|---|---|
Product: | gcc | Reporter: | Matthias Kretz (Vir) <mkretz> |
Component: | target | Assignee: | Jakub Jelinek <jakub> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | jakub |
Priority: | P3 | Keywords: | wrong-code |
Version: | 7.2.0 | ||
Target Milestone: | --- | ||
Host: | Target: | x86_64-*-*, i?86-*-* | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2018-03-12 00:00:00 | |
Attachments: |
unreduced testcase
gcc8-pr84786.patch test case that produces incorrect vpsrlw gcc7-pr84786.patch |
Description
Matthias Kretz (Vir)
2018-03-09 16:15:54 UTC
Without -mavx512vl the 128/256-bit pseudos should never be allocated in xmm16-31 registers, so I really wonder how this happens. If you don't mind attaching unreduced code (compressed), I can do the reduction myself. Created attachment 43618 [details]
unreduced testcase
Compile with `g++ -std=c++17 -O2 -march=knl -o knl-fail knl-fail.cpp`.
The function `Tests::operators_<Vc::v2::simd<double, Vc::v2::detail::avx_abi<32> > >::run()` countains the invalid instructions.
% g++-7 --version
g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
Ah, I see the bug: (define_insn "sse2_loadhpd" [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,x,v,o,o ,o") (vec_concat:V2DF (vec_select:DF (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,0,v,0,0 ,0") (parallel [(const_int 0)])) (match_operand:DF 2 "nonimmediate_operand" " m,m,x,v,x,*f,r")))] While V2DFmode won't be put into %xmm16-%xmm31 without TARGET_AVX512VL, DFmode can be, so we need an avx512vl alternative for the v for the last operand and another one with x for avx/avx2/avx512f etc. without avx512vl. Simplified testcase for the testsuite: /* PR target/84786 */ /* { dg-do run } */ /* { dg-options "-mavx512f -mno-avx512vl -O2" } */ /* { dg-require-effective-target avx512f } */ #include "avx512f-check.h" typedef double V __attribute__((vector_size (16))); __attribute__((noipa)) V foo (V x, double y) { register double z __asm ("xmm18"); asm volatile ("" : "=v" (z) : "0" (y)); x[1] = z; return x; } static void avx512f_test (void) { V a = foo ((V) { 1.0, 2.0 }, 3.0); if (a[0] != 1.0 || a[1] != 3.0) abort (); } Now, the hard part is look around for similar issues (I've done it already at some point, but this case apparently went unnoticed). Created attachment 43628 [details] gcc8-pr84786.patch Untested fix. Author: jakub Date: Tue Mar 13 08:03:28 2018 New Revision: 258475 URL: https://gcc.gnu.org/viewcvs?rev=258475&root=gcc&view=rev Log: PR target/84786 * config/i386/sse.md (sse2_loadhpd): Use Yv constraint rather than v on the last operand. * gcc.target/i386/avx512f-pr84786-1.c: New test. * gcc.target/i386/avx512f-pr84786-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog Fixed for 8.1+ so far. There seems to be a similar bug for vpsrlw and vpsllw. Do you need a testcase? (It's hard to hit the bug... just had one occur on a Travis CI build) (In reply to Matthias Kretz from comment #8) > There seems to be a similar bug for vpsrlw and vpsllw. Do you need a > testcase? (It's hard to hit the bug... just had one occur on a Travis CI > build) I don't see how that is possible. Do you have at least objdump with the instructions (e.g. which exact operand it is), and what gcc options were used? This is all I have right now: TID 0 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x70d281: vpsrlw xmm0, xmm0, xmm16 Image: /home/travis/build/VcDevel/Vc/build-Experimental/c2dd920concentrateGCC7.2.0Relivy-bridgeknl/tests/mask_knl_vectorbuiltin+0x30d281 (in multi-region image, region# 0) Function: _ZN5Tests11load_store_IN2Vc2v29simd_maskIfNS2_6detail7avx_abiILi32EEEEEE3runEv Instruction bytes are: 62 b1 7d 08 d1 c0 See bottom of: http://lxwww53.gsi.de/testDetails.php?test=2016375&build=14519 Created attachment 43762 [details]
test case that produces incorrect vpsrlw
Compiled with `g++-7 -std=c++17 -O0 -fabi-version=0 -fabi-compat-version=0 -march=knl -o fail fail.cpp`
g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
from objdump -d | grep vpsrlw I get:
56e364: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
56eaf6: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
56f174: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
56f68c: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
58ef6f: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
58f6f5: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
58fd67: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
590273: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
59585d: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
595fef: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
596664: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
596b6a: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
59cb7a: 62 b1 7d 28 d1 c0 vpsrlw %xmm16,%ymm0,%ymm0
59d39f: 62 b1 7d 28 d1 c1 vpsrlw %xmm17,%ymm0,%ymm0
59d9fe: 62 b1 7d 28 d1 c2 vpsrlw %xmm18,%ymm0,%ymm0
59dfc6: 62 b1 7d 28 d1 c3 vpsrlw %xmm19,%ymm0,%ymm0
5a407c: 62 b1 7d 28 d1 c0 vpsrlw %xmm16,%ymm0,%ymm0
5a4895: 62 b1 7d 28 d1 c1 vpsrlw %xmm17,%ymm0,%ymm0
5a4eeb: 62 b1 7d 28 d1 c2 vpsrlw %xmm18,%ymm0,%ymm0
5a54ad: 62 b1 7d 28 d1 c3 vpsrlw %xmm19,%ymm0,%ymm0
5be392: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
5bea0b: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
5bef85: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
5bf3be: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
5d8ae0: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
5d9149: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
5d96b3: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
5d9adc: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
5de3e7: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
5dea62: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
5defd5: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
5df3fe: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
5e3cd2: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
5e431b: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
5e4865: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
5e4c6e: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
5e94bd: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
5e9b18: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
5ea06b: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
5ea474: 62 b1 7d 08 d1 c3 vpsrlw %xmm19,%xmm0,%xmm0
799710: c5 f9 d1 c5 vpsrlw %xmm5,%xmm0,%xmm0
7999a9: c5 f9 d1 c6 vpsrlw %xmm6,%xmm0,%xmm0
799e3b: c5 f9 d1 c7 vpsrlw %xmm7,%xmm0,%xmm0
79a101: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
79a3c2: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
79a68c: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
7a1e51: c5 f9 d1 c5 vpsrlw %xmm5,%xmm0,%xmm0
7a20ea: c5 f9 d1 c6 vpsrlw %xmm6,%xmm0,%xmm0
7a2579: c5 f9 d1 c7 vpsrlw %xmm7,%xmm0,%xmm0
7a283f: 62 b1 7d 08 d1 c0 vpsrlw %xmm16,%xmm0,%xmm0
7a2b00: 62 b1 7d 08 d1 c1 vpsrlw %xmm17,%xmm0,%xmm0
7a2dca: 62 b1 7d 08 d1 c2 vpsrlw %xmm18,%xmm0,%xmm0
For 7.x I think we need: --- gcc/config/i386/sse.md.jj 2018-03-05 17:04:45.820743323 +0100 +++ gcc/config/i386/sse.md 2018-03-26 17:29:00.967880855 +0200 @@ -10687,7 +10687,7 @@ [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v") (any_lshift:VI2_AVX2_AVX512BW (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v") - (match_operand:DI 2 "nonmemory_operand" "xN,vN")))] + (match_operand:DI 2 "nonmemory_operand" "xN,YvN")))] "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>" "@ p<vshift><ssemodesuffix>\t{%2, %0|%0, %2} @@ -10706,7 +10706,7 @@ [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v") (any_lshift:VI48_AVX2 (match_operand:VI48_AVX2 1 "register_operand" "0,x,v") - (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))] + (match_operand:DI 2 "nonmemory_operand" "xN,xN,YvN")))] "TARGET_SSE2 && <mask_mode512bit_condition>" "@ p<vshift><ssemodesuffix>\t{%2, %0|%0, %2} but don't have time to test it right now, nor create short self-contained testcases for it. I'll try to apply it locally and will report my findings. I applied both patches to my GCC 7.2 installation and as a result my complete testsuite passes now. Anything else I can help with? Here's an idea for a test case (https://godbolt.org/g/SjM2HE: it appears fixed on GCC 8): typedef unsigned short V __attribute__((vector_size (16))); V foo (V x, int y) { x <<= y; asm volatile (""::"x"(x):"xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15"); return x >> y; } Created attachment 44312 [details] gcc7-pr84786.patch Patch I'm going to bootstrap/regtest on 7.x branch now. On the trunk/in 8.x this got fixed with r253924 aka one of the PR82370 fixes, but that is something that IMHO shouldn't be backported. Author: jakub Date: Fri Jun 22 20:36:31 2018 New Revision: 261919 URL: https://gcc.gnu.org/viewcvs?rev=261919&root=gcc&view=rev Log: Backported from mainline 2018-03-13 Jakub Jelinek <jakub@redhat.com> PR target/84786 * config/i386/sse.md (sse2_loadhpd): Use Yv constraint rather than v on the last operand. * gcc.target/i386/avx512f-pr84786-1.c: New test. * gcc.target/i386/avx512f-pr84786-2.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-2.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/sse.md branches/gcc-7-branch/gcc/testsuite/ChangeLog Author: jakub Date: Mon Jun 25 12:48:29 2018 New Revision: 262014 URL: https://gcc.gnu.org/viewcvs?rev=262014&root=gcc&view=rev Log: PR target/84786 * config/i386/sse.md (vshift_count): New mode attr. (<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN as last operand's constraint for VI2_AVX2_AVX512BW shifts. Use YvN instead of vN as last operand's constraint for VI48_AVX2 shifts. * gcc.target/i386/avx512f-pr84786-3.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/sse.md branches/gcc-7-branch/gcc/testsuite/ChangeLog Author: jakub Date: Mon Jun 25 12:57:04 2018 New Revision: 262015 URL: https://gcc.gnu.org/viewcvs?rev=262015&root=gcc&view=rev Log: PR target/84786 * config/i386/sse.md (vshift_count): New mode attr. (<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN as last operand's constraint for VI2_AVX2_AVX512BW shifts. Use YvN instead of vN as last operand's constraint for VI48_AVX2 shifts. * gcc.target/i386/avx512f-pr84786-3.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c Modified: trunk/gcc/testsuite/ChangeLog Author: jakub Date: Mon Jun 25 12:58:03 2018 New Revision: 262016 URL: https://gcc.gnu.org/viewcvs?rev=262016&root=gcc&view=rev Log: 2018-06-25 Jakub Jelinek <jakub@redhat.com> PR target/84786 * gcc.target/i386/avx512f-pr84786-3.c: New test. Added: branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog Author: jakub Date: Mon Jun 25 17:55:15 2018 New Revision: 262103 URL: https://gcc.gnu.org/viewcvs?rev=262103&root=gcc&view=rev Log: PR target/84786 * config/i386/sse.md (vshift_count): New mode attr. (<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN as last operand's constraint for VI2_AVX2_AVX512BW shifts. Use YvN instead of vN as last operand's constraint for VI48_AVX2 shifts. * gcc.target/i386/avx512f-pr84786-3.c: New test. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/i386/sse.md branches/gcc-6-branch/gcc/testsuite/ChangeLog Fixed for 6.5 and 7.4+ too. |