Bug 87214 - [9 Regression] r263772 miscompiled 520.omnetpp_r in SPEC CPU 2017
Summary: [9 Regression] r263772 miscompiled 520.omnetpp_r in SPEC CPU 2017
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P1 normal
Target Milestone: 9.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2018-09-04 12:03 UTC by Alexander Nesterovskiy
Modified: 2019-08-30 11:32 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-11-28 00:00:00


Attachments
optimized dump with -mprefer-vector-width=128 (3.38 KB, text/plain)
2019-01-24 14:10 UTC, Martin Liška
Details
optimized dump with -mprefer-vector-width=256 (4.42 KB, text/plain)
2019-01-24 14:10 UTC, Martin Liška
Details
vectorizer dump (19.63 KB, application/x-bzip)
2019-01-24 14:19 UTC, Martin Liška
Details
Passing testcase (298 bytes, text/plain)
2019-01-24 17:25 UTC, rsandifo@gcc.gnu.org
Details
gcc9-pr87214-wip.patch (812 bytes, patch)
2019-01-24 21:21 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Nesterovskiy 2018-09-04 12:03:27 UTC
There are runfails for the following benchmarks since r263772:
SPEC2017 520/620: (Segmentation fault, minimal optset to reproduce: "-O3 -march=skylake-avx512 -flto")
SPEC2006 445: (SPEC miscompare, minimal optset to reproduce: "-O3 -march=skylake-avx512")

Running 520.omnetpp_r under GDB:
---
...
Program received signal SIGSEGV, Segmentation fault.
0x00000000004a611e in isName (s=<optimized out>, this=<optimized out>) at simulator/ccomponent.cc:143
143             if (paramv[i].isName(parname))
(gdb) backtrace
#0  0x00000000004a611e in isName (s=<optimized out>, this=<optimized out>) at simulator/ccomponent.cc:143
#1  cComponent::findPar (this=0x7ffff6633380, parname=0x7ffff6603548 "bs") at simulator/ccomponent.cc:143
#2  0x00000000004a87b3 in cComponent::par(char const*) () at simulator/ccomponent.cc:133
#3  0x00000000004b676d in cNEDNetworkBuilder::doParam(cComponent*, ParamElement*, bool) () at simulator/cnednetworkbuilder.cc:179
#4  0x00000000004b8610 in doParams (isSubcomponent=false, paramsNode=<optimized out>, component=0x7ffff6633380, this=0x7fffffffaaf0) at simulator/cnednetworkbuilder.cc:139
#5  cNEDNetworkBuilder::addParametersAndGatesTo(cComponent*, cNEDDeclaration*) () at simulator/cnednetworkbuilder.cc:105
#6  0x000000000048843b in addParametersAndGatesTo (module=0x7ffff6633380, this=<optimized out>) at <GCC_PATH>/include/c++/9.0.0/bits/stl_tree.h:211
#7  cModuleType::create(char const*, cModule*, int, int) () at simulator/ccomponenttype.cc:156
#8  0x000000000045916f in setupNetwork (network=<optimized out>, this=0x7ffff653bc40) at simulator/cnamedobject.h:117
#9  Cmdenv::run() () at simulator/cmdenv.cc:253
#10 0x00000000005186ec in EnvirBase::run(int, char**, cConfiguration*) () at simulator/envirbase.cc:230
#11 0x000000000043d60d in setupUserInterface(int, char**, cConfiguration*) [clone .constprop.112] () at simulator/startup.cc:234
#12 0x000000000042446a in main (argc=1, argv=0x7fffffffb1c8) at simulator/main.cc:39
---

403.gcc miscompares: 200.s, g23.s, scilab.s.
For example:
---
$ diff -u g23_ref.s g23.s | head -n 16
--- g23_ref.s
+++ g23.s
@@ -1746,19 +1746,19 @@
        testq   %rbx, %rbx
        jne     .L904
        movq    %r12, %rdx
-       xorl    %r8d, %r8d
+       xorl    %esi, %esi
        negq    %rdx
 .L905:
        addq    %rcx, %rdx
-       leaq    (%rax,%r8), %rax
+       leaq    (%rax,%rsi), %rax
        leaq    1(%rdx), %rcx
-       cmpq    %r8, %rax
+       cmpq    %rsi, %rax

---

Unfortunately I didn't manage to create a reproducer.
Comment 1 Jakub Jelinek 2018-11-27 15:13:35 UTC
Can you still reproduce this?  There have been several vectorizer fixes since then?
Comment 2 H.J. Lu 2018-11-28 13:45:25 UTC
I can't reproduce it with r266551.
Comment 3 H.J. Lu 2019-01-09 03:51:39 UTC
On Intel machine with AVX512F,  r263772 miscompiled 520.omnetpp_r in SPEC
CPU 2017 with

-DSPEC -DSPEC_CPU -DNDEBUG -Isimulator/platdep -Isimulator -Imodel -DWITH_NETBUILDER -DSPEC_AUTO_SUPPRESS_OPENMP  -fno-unsafe-math-optimizations -mfpmath=sse -g -march=native -Ofast -funroll-loops -flto         -DSPEC_LP64 

Program received signal SIGSEGV, Segmentation fault.
0x00000000004a8ddb in cObject::isName (s=<optimized out>, this=<optimized out>)
    at simulator/cobject.h:118
118	    bool isName(const char *s) const {return !opp_strcmp(getName(),s);}
(gdb) bt
#0  0x00000000004a8ddb in cObject::isName (s=<optimized out>, 
    this=<optimized out>) at simulator/cobject.h:118
#1  cComponent::findPar (this=0x699040, parname=0x669c58 "bs")
    at simulator/ccomponent.cc:143
#2  0x00000000004acdb4 in cComponent::par (this=0x699040, 
    parname=0x669c58 "bs") at simulator/ccomponent.cc:133
#3  0x00000000004be27c in cNEDNetworkBuilder::doParam (this=0x7fffffffd500, 
    component=0x699040, paramNode=0x669bd0, isSubcomponent=<optimized out>)
    at simulator/cnednetworkbuilder.cc:179
#4  0x00000000004c0020 in cNEDNetworkBuilder::doParams (isSubcomponent=false, 
    paramsNode=<optimized out>, component=0x699040, this=0x7fffffffd500)
    at simulator/cnednetworkbuilder.cc:139
#5  cNEDNetworkBuilder::addParametersAndGatesTo (this=0x7fffffffd500, 
    component=0x699040, decl=0x695e60) at simulator/cnednetworkbuilder.cc:105
#6  0x000000000048a1bd in cDynamicModuleType::addParametersAndGatesTo (
    module=0x699040, this=<optimized out>)
    at /export/ssd/git/gcc-test-spec/usr/include/c++/9.0.0/bits/stl_tree.h:211
#7  cModuleType::create (this=<optimized out>, modname=<optimized out>, 
    parentmod=<optimized out>, vectorsize=<optimized out>, 
    index=<optimized out>) at simulator/ccomponenttype.cc:156
#8  0x00000000004643aa in cModuleType::create (parentmod=0x0, 
    modname=<optimized out>, this=<optimized out>)
    at simulator/ccomponenttype.cc:106
--Type <RET> for more, q to quit, c to continue without paging--
#9  cSimulation::setupNetwork (network=<optimized out>, this=<optimized out>)
    at simulator/csimulation.cc:369
#10 Cmdenv::run (this=0x624d80) at simulator/cmdenv.cc:253
#11 0x000000000051673c in EnvirBase::run (this=0x624d80, argc=<optimized out>, 
    argv=<optimized out>, configobject=0x61a640) at simulator/envirbase.cc:230
#12 0x00000000004421b2 in setupUserInterface(int, char**, cConfiguration*) [clone .constprop.0] (argc=argc@entry=5, argv=argv@entry=0x7fffffffdc18, cfg=0x0)
    at simulator/startup.cc:234
#13 0x000000000042f2fd in main (argc=5, argv=0x7fffffffdc18)
    at simulator/main.cc:39
(gdb) 
...
  0x00000000004a8db5 <+325>:	lea    0x1(%r15),%rbx
   0x00000000004a8db9 <+329>:	mov    0x58(%rbp),%r8
   0x00000000004a8dbd <+333>:	lea    (%rbx,%rbx,2),%rdi
   0x00000000004a8dc1 <+337>:	lea    (%r8,%rdi,8),%rdi
   0x00000000004a8dc5 <+341>:	mov    (%rdi),%r9
   0x00000000004a8dc8 <+344>:	mov    %ebx,%r12d
   0x00000000004a8dcb <+347>:	mov    0x30(%r9),%rax
   0x00000000004a8dcf <+351>:	cmp    $0x4a8080,%rax
   0x00000000004a8dd5 <+357>:	je     0x4a8d20 <cComponent::findPar(char const*) const+176>
=> 0x00000000004a8ddb <+363>:	callq  *%rax
   0x00000000004a8ddd <+365>:	mov    %rax,%rdi
   0x00000000004a8de0 <+368>:	test   %rax,%rax
   0x00000000004a8de3 <+371>:	jne    0x4a8d47 <cComponent::findPar(char const*) const+215>
   0x00000000004a8de9 <+377>:	cmpb   $0x0,0x0(%r13)
   0x00000000004a8dee <+382>:	jne    0x4a8d57 <cComponent::findPar(char const*) const+231>
   0x00000000004a8df4 <+388>:	add    $0x8,%rsp
   0x00000000004a8df8 <+392>:	pop    %rbx
   0x00000000004a8df9 <+393>:	pop    %rbp
   0x00000000004a8dfa <+394>:	mov    %r12d,%eax
   0x00000000004a8dfd <+397>:	pop    %r12
   0x00000000004a8dff <+399>:	pop    %r13
   0x00000000004a8e01 <+401>:	pop    %r14
   0x00000000004a8e03 <+403>:	pop    %r15
   0x00000000004a8e05 <+405>:	retq   
   0x00000000004a8e06 <+406>:	nopw   %cs:0x0(%rax,%rax,1)
   0x00000000004a8e10 <+416>:	callq  *%rax
   0x00000000004a8e12 <+418>:	mov    %rax,%rdi
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) p/x $rax
$1 = 0x5c8d480000009b85
(gdb) p/x *(long *) $rax
Cannot access memory at address 0x5c8d480000009b85
(gdb) 

This address looks odd.
Comment 4 rsandifo@gcc.gnu.org 2019-01-09 09:22:01 UTC
Mine then.
Comment 5 H.J. Lu 2019-01-23 13:34:25 UTC
Adding -fno-strict-aliasing fixes the issue.
Comment 6 Martin Liška 2019-01-24 10:42:27 UTC
Well, omnetpp_r has no known portability issues:
https://www.spec.org/cpu2017/Docs/benchmarks/520.omnetpp_r.html

So that I would like to know what violates the aliasing. Let me debug that..
Comment 7 rguenther@suse.de 2019-01-24 10:51:40 UTC
On Thu, 24 Jan 2019, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214
> 
> Martin Liška <marxin at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|RESOLVED                    |ASSIGNED
>          Resolution|INVALID                     |---
>            Assignee|rsandifo at gcc dot gnu.org        |marxin at gcc dot gnu.org
> 
> --- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
> Well, omnetpp_r has no known portability issues:
> https://www.spec.org/cpu2017/Docs/benchmarks/520.omnetpp_r.html
> 
> So that I would like to know what violates the aliasing. Let me debug that..

A lot of benchmarks end up using spec_qsort...
Comment 8 Martin Liška 2019-01-24 10:54:45 UTC
(In reply to rguenther@suse.de from comment #7)
> On Thu, 24 Jan 2019, marxin at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214
> > 
> > Martin Liška <marxin at gcc dot gnu.org> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >              Status|RESOLVED                    |ASSIGNED
> >          Resolution|INVALID                     |---
> >            Assignee|rsandifo at gcc dot gnu.org        |marxin at gcc dot gnu.org
> > 
> > --- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
> > Well, omnetpp_r has no known portability issues:
> > https://www.spec.org/cpu2017/Docs/benchmarks/520.omnetpp_r.html
> > 
> > So that I would like to know what violates the aliasing. Let me debug that..
> 
> A lot of benchmarks end up using spec_qsort...

Ah, yes, I overlooked that as the file has a different suffix:
./benchspec/CPU/520.omnetpp_r/src/simulator/spec_qsort.cc

So let me test it with fixed qsort function.
Comment 9 Martin Liška 2019-01-24 11:25:51 UTC
I guess it's not related to qsort (the files looks different and fine:

#include <cstdlib>
#include "spec_qsort.h"

static void spec_swap(void *x, void *y, size_t l) {
   /* Swap elements of an array byte by byte.  Note that a version specialized to
      operate on a specific data type (e.g. int) would be faster. */
   char *a = (char *)x, *b = (char *)y, c;
   while(l--) {
      c = *a;
      *a++ = *b;
      *b++ = c;
   }
}


static void spec_sort(char *array, size_t size, int begin, int end, int (*cmp)(const void*,const void*)) {
   /* Generic qsort algorithm */
   if (end > begin) {
      void *pivot = array + begin;
      int l = begin + size;
      int r = end;
      while(l < r) {
         if (cmp(array+l,pivot) <= 0) {
            l += size;
         } else {
            r -= size;
            spec_swap(array+l, array+r, size); 
         }
      }
      l -= size;
      spec_swap(array+begin, array+l, size);
      spec_sort(array, size, begin, l, cmp);
      spec_sort(array, size, r, end, cmp);
   }
}


void spec_qsort(void *array, size_t nitems, size_t size, int (*cmp)(const void*,const void*)) {
   spec_sort((char *)array, size, 0, (nitems-1)*size, cmp);
}
Comment 10 Martin Liška 2019-01-24 11:27:20 UTC
Only following 2 LTO object files trigger the segfault:
simulator/cpar.o and simulator/ccomponent.o (rest are -fno-lto object files).
Comment 11 Martin Liška 2019-01-24 14:10:34 UTC
Created attachment 45520 [details]
optimized dump with -mprefer-vector-width=128
Comment 12 Martin Liška 2019-01-24 14:10:59 UTC
Created attachment 45521 [details]
optimized dump with -mprefer-vector-width=256
Comment 13 Martin Liška 2019-01-24 14:13:42 UTC
The 2 problematic functions looks like:

void cComponent::reallocParamv(int size)
{
    ((void)0);
    if (size!=(short)size)
        throw cRuntimeError(this, "reallocParamv(%d): at most %d parameters allowed", size, 0x7fff);
    cPar *newparamv = new cPar[size];
__builtin_printf ("realloc called with new size: paramvsize: %d\n", numparams);
    for (int i=0; i<numparams; i++)
        __builtin_printf ("%d:%s\n", i,paramv[i].getName());
__builtin_printf ("\n");

    for (int i=0; i<numparams; i++)
        paramv[i].moveto(newparamv[i]);

    for (int i=0; i<numparams; i++)
        __builtin_printf ("%d:%s\n", i,newparamv[i].getName());
__builtin_printf ("realloc done\n");
    delete [] paramv;
    paramv = newparamv;
    paramvsize = (short)size;
}

void cComponent::addPar(cParImpl *value)
{
__builtin_printf ("addPar: paramvsize: %d, name: %s\n", paramvsize, value->getName());
    if (parametersFinalized())
        throw cRuntimeError(this, "cannot add parameters at runtime");
    if (findPar(value->getName())>=0)
        throw cRuntimeError(this, "cannot add parameter `%s': already exists", value->getName());
    if (numparams==paramvsize)
        reallocParamv(paramvsize+1);
    paramv[numparams++].init(this, value);
}

where the vectorized version prints:

Preparing for running configuration General, run #0...
Scenario: $repetition=0
Assigned runID=speccpu-runid
Setting up network `largeNet'...
addPar: paramvsize: 0, name: n
findPar: n
realloc called with new size: paramvsize: 0

realloc done
findPar: n
addPar: paramvsize: 1, name: bbs
findPar: bbs
realloc called with new size: paramvsize: 1
0:n

0:n
realloc done
findPar: bbs
addPar: paramvsize: 2, name: bbm
findPar: bbm
realloc called with new size: paramvsize: 2
0:n
1:bbs

0:n
1:bbs
realloc done
findPar: bbm
addPar: paramvsize: 3, name: bbl
findPar: bbl
realloc called with new size: paramvsize: 3
0:n
1:bbs
2:bbm

0:n
1:bbs
2:bbm
realloc done
findPar: bbl
addPar: paramvsize: 4, name: as
findPar: as
realloc called with new size: paramvsize: 4
0:n
1:bbs
2:bbm
3:bbl

0:n
1:bbs
2:bbm
3:bbl
realloc done
findPar: as
addPar: paramvsize: 5, name: am
findPar: am
realloc called with new size: paramvsize: 5
0:n
1:bbs
2:bbm
3:bbl
4:as

0:n
1:bbs
2:bbm
3:bbl
4:as
realloc done
findPar: am
addPar: paramvsize: 6, name: al
findPar: al
realloc called with new size: paramvsize: 6
0:n
1:bbs
2:bbm
3:bbl
4:as
5:am

0:n
1:bbs
2:bbm
3:largeNet
4:as
5:am
realloc done
findPar: al
addPar: paramvsize: 7, name: bs
findPar: bs
realloc called with new size: paramvsize: 7
0:n
1:bbs
2:bbm
3:largeNet
4:as
5:am
6:al

0:n
1:bbs
2:bbm
Segmentation fault (core dumped)


As seen the moveto is wrong for paramvsize == 6 (5 old elements), where element #3 should be 'bbl' after copying, but is 'largeNet'. Then we reach a segfault due to it.
Comment 14 Martin Liška 2019-01-24 14:14:36 UTC
and moveto does:

void cPar::moveto(cPar& other)
{
    other.ownercomponent = ownercomponent;
    other.p = p;
    p = 
# 62 "simulator/cpar.cc" 3 4
       __null
# 62 "simulator/cpar.cc"
           ;
}
Comment 15 Martin Liška 2019-01-24 14:19:47 UTC
Created attachment 45522 [details]
vectorizer dump
Comment 16 rsandifo@gcc.gnu.org 2019-01-24 17:25:16 UTC
Created attachment 45526 [details]
Passing testcase

I'm still not sure where the problem is coming in.  The loop in the vector dump looks functionally correct now I've had change to look at it more (contrary to my initial comment on IRC).  It seems to be equivalent to the attached, which passed on an AVX2 box I found I had accesss to.
Comment 17 Martin Liška 2019-01-24 19:00:44 UTC
(In reply to rsandifo@gcc.gnu.org from comment #16)
> Created attachment 45526 [details]
> Passing testcase
> 
> I'm still not sure where the problem is coming in.  The loop in the vector
> dump looks functionally correct now I've had change to look at it more
> (contrary to my initial comment on IRC).  It seems to be equivalent to the
> attached, which passed on an AVX2 box I found I had accesss to.

But it fails on a skylake-avx512 machine. Minimal test-case that fails:

$ cat avx.c
struct s { unsigned long a, b, c; };

void __attribute__ ((noipa))
f (struct s *restrict s1, struct s *restrict s2, int n)
{
  for (int i = 0; i < n; ++i)
    {
      s1[i].b = s2[i].b;
      s1[i].c = s2[i].c;
      s2[i].c = 0;
    }
}

#define N 6

int
main (void)
{
  struct s s1[N], s2[N];
  for (unsigned int j = 0; j < 6; ++j)
  {
	  s2[j].a = j * 5;
	  s2[j].b = j * 5 + 2;
	  s2[j].c = j * 5 + 4;
  }
  f (s1, s2, 6);
  for (unsigned int j = 0; j < 6; ++j)
	  if (s1[j].b != j * 5 + 2)
	  {
		  __builtin_printf ("wrong at: %d: is %d, should be %d\n", j, s1[j].b, j * 5 + 2);
		  __builtin_abort ();
	  }

__builtin_printf ("OK\n");
  return 0;
}

$ gcc -march=skylake-avx512 avx.c -g && ./a.out  && gcc -march=skylake-avx512 avx.c -g -O3 && ./a.out 
OK
wrong at: 3: is 15, should be 17
Aborted (core dumped)
Comment 18 Martin Liška 2019-01-24 19:08:20 UTC
One can reproduce that with Intel SDE simulator:
https://software.intel.com/protected-download/267266/144917

$ ./sde-external-8.16.0-2018-01-30-lin/sde -skx -- /tmp/a.out
wrong at: 3: is 15, should be 17
Aborted (core dumped)
Comment 19 rsandifo@gcc.gnu.org 2019-01-24 20:13:05 UTC
OK.  The .optimized dumps seem to be the same for both -mavx2 and -march=skylake-avx512.  Things only diverge during expand.

It looks like it might be a bug in:

(define_insn "<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>"
  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
        (vec_select:VI8F_256
          (vec_concat:<ssedoublemode>
            (match_operand:VI8F_256 1 "register_operand" "v")
            (match_operand:VI8F_256 2 "nonimmediate_operand" "vm"))
          (parallel [(match_operand 3  "const_0_to_3_operand")
                     (match_operand 4  "const_0_to_3_operand")
                     (match_operand 5  "const_4_to_7_operand")
                     (match_operand 6  "const_4_to_7_operand")])))]
  "TARGET_AVX512VL
   && (INTVAL (operands[3]) == (INTVAL (operands[4]) - 1)
       && INTVAL (operands[5]) == (INTVAL (operands[6]) - 1))"
{
  int mask;
  mask = INTVAL (operands[3]) / 2;
  mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
  operands[3] = GEN_INT (mask);
  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0<mask_operand7>|%0<mask_operand7>, %1, %2, %3}";
}
  [(set_attr "type" "sselog")
   (set_attr "length_immediate" "1")
   (set_attr "prefix" "evex")
   (set_attr "mode" "XI")])

which AFAICT requires without checking that operands 3 and 5 are even (0 or 2 and 4 or 6 respectively).  In this case we're using it to match:

(insn 40 39 41 6 (set (reg:V4DI 101 [ vect__5.17 ])
        (vec_select:V4DI (vec_concat:V8DI (reg:V4DI 98 [ vect__5.14 ])
                (reg:V4DI 140 [ vect__5.15 ]))
            (parallel [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                    (const_int 5 [0x5])
                    (const_int 6 [0x6])
                ]))) "/tmp/foo.c":8:22 4069 {*avx512dq_shuf_i64x2_1}
     (nil))

and treat the permute mask as {2, 3, 4, 5} instead.
Comment 20 rsandifo@gcc.gnu.org 2019-01-24 20:14:52 UTC
Not really best placed to fix or test this.
Comment 21 Jakub Jelinek 2019-01-24 20:28:50 UTC
I'll handle this.
Comment 22 Jakub Jelinek 2019-01-24 20:45:54 UTC
Even more reduced testcase:
typedef long long int V __attribute__((vector_size (4 * sizeof (long long int))));

__attribute__((noipa))
void foo (V *p)
{
  p[0] = __builtin_shuffle (p[1], p[2], (V) { 2, 3, 5, 6 });
}

int
main ()
{
  V a[3] = { { 0, 0, 0, 0 }, { 10, 11, 12, 13 }, { 14, 15, 16, 17 } };
  foo (a);
  if (a[0][0] != 12 || a[0][1] != 13 || a[0][2] != 15 || a[0][3] != 16)
    __builtin_abort ();
  return 0;
}

Works with -O2 -mavx2, aborts with -O2 -mavx512vl.
Comment 23 Jakub Jelinek 2019-01-24 21:21:56 UTC
Created attachment 45528 [details]
gcc9-pr87214-wip.patch

Untested fix.  Still need to cover all the changes with testcases.
Comment 24 Jakub Jelinek 2019-01-27 11:57:16 UTC
Author: jakub
Date: Sun Jan 27 11:56:44 2019
New Revision: 268310

URL: https://gcc.gnu.org/viewcvs?rev=268310&root=gcc&view=rev
Log:
	PR target/87214
	* config/i386/sse.md
	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>,
	avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ensure the
	first constants in pairs are multiples of 2.  Formatting fixes.
	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>,
	avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ensure the
	first constants in each quadruple are multiples of 4.  Formatting fixes.

	* gcc.target/i386/avx512vl-pr87214-1.c: New test.
	* gcc.target/i386/avx512vl-pr87214-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-1.c
    trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 25 Jakub Jelinek 2019-01-27 11:59:10 UTC
Fixed.  Will backport to release branches eventually though, as it is latent there.
Comment 26 Jakub Jelinek 2019-02-07 14:43:26 UTC
Author: jakub
Date: Thu Feb  7 14:42:54 2019
New Revision: 268633

URL: https://gcc.gnu.org/viewcvs?rev=268633&root=gcc&view=rev
Log:
	Backported from mainline
	2019-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR target/87214
	* config/i386/sse.md
	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>,
	avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ensure the
	first constants in pairs are multiples of 2.  Formatting fixes.
	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>,
	avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ensure the
	first constants in each quadruple are multiples of 4.  Formatting fixes.

	* gcc.target/i386/avx512vl-pr87214-1.c: New test.
	* gcc.target/i386/avx512vl-pr87214-2.c: New test.

Added:
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-1.c
    branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-2.c
Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/config/i386/sse.md
    branches/gcc-8-branch/gcc/testsuite/ChangeLog
Comment 27 Jakub Jelinek 2019-08-30 11:32:47 UTC
Author: jakub
Date: Fri Aug 30 11:32:15 2019
New Revision: 275092

URL: https://gcc.gnu.org/viewcvs?rev=275092&root=gcc&view=rev
Log:
	Backported from mainline
	2019-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR target/87214
	* config/i386/sse.md
	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>,
	avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ensure the
	first constants in pairs are multiples of 2.  Formatting fixes.
	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>,
	avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ensure the
	first constants in each quadruple are multiples of 4.  Formatting fixes.

	* gcc.target/i386/avx512vl-pr87214-1.c: New test.
	* gcc.target/i386/avx512vl-pr87214-2.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx512vl-pr87214-2.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/sse.md
    branches/gcc-7-branch/gcc/testsuite/ChangeLog