Bug 84786

Summary: [miscompilation] vunpcklpd accessing xmm16-22 targeting KNL
Product: gcc Reporter: Matthias Kretz (Vir) <mkretz>
Component: targetAssignee: Jakub Jelinek <jakub>
Status: RESOLVED FIXED    
Severity: normal CC: jakub
Priority: P3 Keywords: wrong-code
Version: 7.2.0   
Target Milestone: ---   
Host: Target: x86_64-*-*, i?86-*-*
Build: Known to work:
Known to fail: Last reconfirmed: 2018-03-12 00:00:00
Attachments: unreduced testcase
gcc8-pr84786.patch
test case that produces incorrect vpsrlw
gcc7-pr84786.patch

Description Matthias Kretz (Vir) 2018-03-09 16:15:54 UTC
I see generated code, such as:

  424821:·   vpxord %zmm17,%zmm17,%zmm17
  424827:·   vpxord %zmm18,%zmm18,%zmm18
[...]
  424855:·   vunpcklpd %xmm17,%xmm0,%xmm1
[...]
  424891:·   vunpcklpd %xmm18,%xmm1,%xmm1

when compiling with `-O2 -march=knl`. Apparently the `_mm_unpacklo_pd` intrinsic is incorrectly translated to an encoding that allows the upper 16 SIMD registers for the first register.

Reducing a test case will take some time.
Comment 1 Jakub Jelinek 2018-03-09 16:58:03 UTC
Without -mavx512vl the 128/256-bit pseudos should never be allocated in xmm16-31 registers, so I really wonder how this happens.  If you don't mind attaching unreduced code (compressed), I can do the reduction myself.
Comment 2 Matthias Kretz (Vir) 2018-03-10 22:32:37 UTC
Created attachment 43618 [details]
unreduced testcase

Compile with `g++ -std=c++17 -O2 -march=knl -o knl-fail knl-fail.cpp`.
The function `Tests::operators_<Vc::v2::simd<double, Vc::v2::detail::avx_abi<32> > >::run()` countains the invalid instructions.

% g++-7 --version
g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
Comment 3 Jakub Jelinek 2018-03-12 09:31:35 UTC
Ah, I see the bug:
(define_insn "sse2_loadhpd"
  [(set (match_operand:V2DF 0 "nonimmediate_operand"
          "=x,v,x,v,o,o ,o")
        (vec_concat:V2DF
          (vec_select:DF
            (match_operand:V2DF 1 "nonimmediate_operand"
          " 0,v,0,v,0,0 ,0")
            (parallel [(const_int 0)]))
          (match_operand:DF 2 "nonimmediate_operand"
          " m,m,x,v,x,*f,r")))]
While V2DFmode won't be put into %xmm16-%xmm31 without TARGET_AVX512VL, DFmode can be, so we need an avx512vl alternative for the v for the last operand and another one with x for avx/avx2/avx512f etc. without avx512vl.
Comment 4 Jakub Jelinek 2018-03-12 09:48:05 UTC
Simplified testcase for the testsuite:
/* PR target/84786 */
/* { dg-do run } */
/* { dg-options "-mavx512f -mno-avx512vl -O2" } */
/* { dg-require-effective-target avx512f } */

#include "avx512f-check.h"

typedef double V __attribute__((vector_size (16)));

__attribute__((noipa)) V
foo (V x, double y)
{
  register double z __asm ("xmm18");
  asm volatile ("" : "=v" (z) : "0" (y));
  x[1] = z;
  return x;
}

static void
avx512f_test (void)
{
  V a = foo ((V) { 1.0, 2.0 }, 3.0);
  if (a[0] != 1.0 || a[1] != 3.0)
    abort ();
}

Now, the hard part is look around for similar issues (I've done it already at some point, but this case apparently went unnoticed).
Comment 5 Jakub Jelinek 2018-03-12 10:38:43 UTC
Created attachment 43628 [details]
gcc8-pr84786.patch

Untested fix.
Comment 6 Jakub Jelinek 2018-03-13 08:04:00 UTC
Author: jakub
Date: Tue Mar 13 08:03:28 2018
New Revision: 258475

URL: https://gcc.gnu.org/viewcvs?rev=258475&root=gcc&view=rev
Log:
	PR target/84786
	* config/i386/sse.md (sse2_loadhpd): Use Yv constraint rather than v
	on the last operand.

	* gcc.target/i386/avx512f-pr84786-1.c: New test.
	* gcc.target/i386/avx512f-pr84786-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-1.c
    trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2018-03-13 08:07:12 UTC
Fixed for 8.1+ so far.
Comment 8 Matthias Kretz (Vir) 2018-03-26 13:23:37 UTC
There seems to be a similar bug for vpsrlw and vpsllw. Do you need a testcase? (It's hard to hit the bug... just had one occur on a Travis CI build)
Comment 9 Jakub Jelinek 2018-03-26 14:51:43 UTC
(In reply to Matthias Kretz from comment #8)
> There seems to be a similar bug for vpsrlw and vpsllw. Do you need a
> testcase? (It's hard to hit the bug... just had one occur on a Travis CI
> build)

I don't see how that is possible.

Do you have at least objdump with the instructions (e.g. which exact operand it is), and what gcc options were used?
Comment 10 Matthias Kretz (Vir) 2018-03-26 15:09:45 UTC
This is all I have right now:
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x70d281: vpsrlw xmm0, xmm0, xmm16
Image: /home/travis/build/VcDevel/Vc/build-Experimental/c2dd920concentrateGCC7.2.0Relivy-bridgeknl/tests/mask_knl_vectorbuiltin+0x30d281 (in multi-region image, region# 0)
Function: _ZN5Tests11load_store_IN2Vc2v29simd_maskIfNS2_6detail7avx_abiILi32EEEEEE3runEv
Instruction bytes are: 62 b1 7d 08 d1 c0

See bottom of: http://lxwww53.gsi.de/testDetails.php?test=2016375&build=14519
Comment 11 Matthias Kretz (Vir) 2018-03-26 15:26:25 UTC
Created attachment 43762 [details]
test case that produces incorrect vpsrlw

Compiled with `g++-7 -std=c++17 -O0 -fabi-version=0 -fabi-compat-version=0 -march=knl -o fail fail.cpp`

g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0

from objdump -d | grep vpsrlw I get:
  56e364:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  56eaf6:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  56f174:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  56f68c:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  58ef6f:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  58f6f5:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  58fd67:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  590273:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  59585d:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  595fef:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  596664:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  596b6a:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  59cb7a:       62 b1 7d 28 d1 c0       vpsrlw %xmm16,%ymm0,%ymm0
  59d39f:       62 b1 7d 28 d1 c1       vpsrlw %xmm17,%ymm0,%ymm0
  59d9fe:       62 b1 7d 28 d1 c2       vpsrlw %xmm18,%ymm0,%ymm0
  59dfc6:       62 b1 7d 28 d1 c3       vpsrlw %xmm19,%ymm0,%ymm0
  5a407c:       62 b1 7d 28 d1 c0       vpsrlw %xmm16,%ymm0,%ymm0
  5a4895:       62 b1 7d 28 d1 c1       vpsrlw %xmm17,%ymm0,%ymm0
  5a4eeb:       62 b1 7d 28 d1 c2       vpsrlw %xmm18,%ymm0,%ymm0
  5a54ad:       62 b1 7d 28 d1 c3       vpsrlw %xmm19,%ymm0,%ymm0
  5be392:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  5bea0b:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  5bef85:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  5bf3be:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  5d8ae0:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  5d9149:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  5d96b3:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  5d9adc:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  5de3e7:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  5dea62:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  5defd5:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  5df3fe:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  5e3cd2:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  5e431b:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  5e4865:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  5e4c6e:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  5e94bd:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  5e9b18:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  5ea06b:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  5ea474:       62 b1 7d 08 d1 c3       vpsrlw %xmm19,%xmm0,%xmm0
  799710:       c5 f9 d1 c5             vpsrlw %xmm5,%xmm0,%xmm0
  7999a9:       c5 f9 d1 c6             vpsrlw %xmm6,%xmm0,%xmm0
  799e3b:       c5 f9 d1 c7             vpsrlw %xmm7,%xmm0,%xmm0
  79a101:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  79a3c2:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  79a68c:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
  7a1e51:       c5 f9 d1 c5             vpsrlw %xmm5,%xmm0,%xmm0
  7a20ea:       c5 f9 d1 c6             vpsrlw %xmm6,%xmm0,%xmm0
  7a2579:       c5 f9 d1 c7             vpsrlw %xmm7,%xmm0,%xmm0
  7a283f:       62 b1 7d 08 d1 c0       vpsrlw %xmm16,%xmm0,%xmm0
  7a2b00:       62 b1 7d 08 d1 c1       vpsrlw %xmm17,%xmm0,%xmm0
  7a2dca:       62 b1 7d 08 d1 c2       vpsrlw %xmm18,%xmm0,%xmm0
Comment 12 Jakub Jelinek 2018-03-26 15:37:45 UTC
For 7.x I think we need:
--- gcc/config/i386/sse.md.jj	2018-03-05 17:04:45.820743323 +0100
+++ gcc/config/i386/sse.md	2018-03-26 17:29:00.967880855 +0200
@@ -10687,7 +10687,7 @@
   [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
 	(any_lshift:VI2_AVX2_AVX512BW
 	  (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
-	  (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "xN,YvN")))]
   "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
@@ -10706,7 +10706,7 @@
   [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
 	(any_lshift:VI48_AVX2
 	  (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
-	  (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
+	  (match_operand:DI 2 "nonmemory_operand" "xN,xN,YvN")))]
   "TARGET_SSE2 && <mask_mode512bit_condition>"
   "@
    p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}

but don't have time to test it right now, nor create short self-contained testcases for it.
Comment 13 Matthias Kretz (Vir) 2018-03-27 08:19:07 UTC
I'll try to apply it locally and will report my findings.
Comment 14 Matthias Kretz (Vir) 2018-03-27 13:15:56 UTC
I applied both patches to my GCC 7.2 installation and as a result my complete testsuite passes now. Anything else I can help with?
Comment 15 Matthias Kretz (Vir) 2018-03-27 14:28:18 UTC
Here's an idea for a test case (https://godbolt.org/g/SjM2HE: it appears fixed on GCC 8):

typedef unsigned short V __attribute__((vector_size (16)));

V foo (V x, int y)
{
  x <<= y;
  asm volatile (""::"x"(x):"xmm1", "xmm2", "xmm3",
     "xmm4", "xmm5", "xmm6", "xmm7", "xmm8", "xmm9",
     "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15");
  return x >> y;
}
Comment 16 Jakub Jelinek 2018-06-22 17:04:46 UTC
Created attachment 44312 [details]
gcc7-pr84786.patch

Patch I'm going to bootstrap/regtest on 7.x branch now.
On the trunk/in 8.x this got fixed with r253924 aka one of the PR82370 fixes, but that is something that IMHO shouldn't be backported.
Comment 17 Jakub Jelinek 2018-06-22 20:37:03 UTC
Author: jakub
Date: Fri Jun 22 20:36:31 2018
New Revision: 261919

URL: https://gcc.gnu.org/viewcvs?rev=261919&root=gcc&view=rev
Log:
	Backported from mainline
	2018-03-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/84786
	* config/i386/sse.md (sse2_loadhpd): Use Yv constraint rather than v
	on the last operand.

	* gcc.target/i386/avx512f-pr84786-1.c: New test.
	* gcc.target/i386/avx512f-pr84786-2.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-2.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/sse.md
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 18 Jakub Jelinek 2018-06-25 12:49:43 UTC
Author: jakub
Date: Mon Jun 25 12:48:29 2018
New Revision: 262014

URL: https://gcc.gnu.org/viewcvs?rev=262014&root=gcc&view=rev
Log:
	PR target/84786
	* config/i386/sse.md (vshift_count): New mode attr.
	(<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN
	as last operand's constraint for VI2_AVX2_AVX512BW shifts.  Use YvN
	instead of vN as last operand's constraint for VI48_AVX2 shifts.

	* gcc.target/i386/avx512f-pr84786-3.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/sse.md
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 19 Jakub Jelinek 2018-06-25 12:57:35 UTC
Author: jakub
Date: Mon Jun 25 12:57:04 2018
New Revision: 262015

URL: https://gcc.gnu.org/viewcvs?rev=262015&root=gcc&view=rev
Log:
	PR target/84786
	* config/i386/sse.md (vshift_count): New mode attr.
	(<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN
	as last operand's constraint for VI2_AVX2_AVX512BW shifts.  Use YvN
	instead of vN as last operand's constraint for VI48_AVX2 shifts.

	* gcc.target/i386/avx512f-pr84786-3.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 20 Jakub Jelinek 2018-06-25 12:58:35 UTC
Author: jakub
Date: Mon Jun 25 12:58:03 2018
New Revision: 262016

URL: https://gcc.gnu.org/viewcvs?rev=262016&root=gcc&view=rev
Log:
2018-06-25  Jakub Jelinek  <jakub@redhat.com>

	PR target/84786
	* gcc.target/i386/avx512f-pr84786-3.c: New test.

Added:
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c
Modified:
    branches/gcc-8-branch/gcc/testsuite/ChangeLog
Comment 21 Jakub Jelinek 2018-06-25 17:55:47 UTC
Author: jakub
Date: Mon Jun 25 17:55:15 2018
New Revision: 262103

URL: https://gcc.gnu.org/viewcvs?rev=262103&root=gcc&view=rev
Log:
	PR target/84786
	* config/i386/sse.md (vshift_count): New mode attr.
	(<shift_insn><mode>3<mask_name>): Use <vshift_count>N instead of vN
	as last operand's constraint for VI2_AVX2_AVX512BW shifts.  Use YvN
	instead of vN as last operand's constraint for VI48_AVX2 shifts.

	* gcc.target/i386/avx512f-pr84786-3.c: New test.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/avx512f-pr84786-3.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/i386/sse.md
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 22 Jakub Jelinek 2018-06-25 18:20:52 UTC
Fixed for 6.5 and 7.4+ too.