Bug 83252 - [8 Regression] Wrong code with "-march=skylake-avx512 -O3"
Summary: [8 Regression] Wrong code with "-march=skylake-avx512 -O3"
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: 8.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2017-12-02 00:27 UTC by Dmitry Babokin
Modified: 2021-11-01 23:07 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-12-05 00:00:00


Attachments
reproducer (1.24 KB, text/x-csrc)
2017-12-02 00:27 UTC, Dmitry Babokin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Babokin 2017-12-02 00:27:17 UTC
Created attachment 42775 [details]
reproducer

gcc trunk rev255248, x86_64.

Attached test case produces wrong code:
> g++ -O0 f.cpp -o out
> g++ -march=skylake-avx512 -O3 f.cpp -o out3
> ./out
88480289
> ./out3
88480288

I was not able to reduce it further. At this point the test case doesn't contain undefined behavior (at least address and undefined behavior sanitizers think so). Original test case didn't have undefined behavior guaranteed by the way it was generated, but while reduction it could be possibly introduced, but to my judgement it's a valid test case without UB.

Note that to reproduce it's essential to compile with "-march=skylake-avx512 -O3".

Sadly, gcc doesn't provide any bug pointing functionality, which would help understanding the root cause of the bug without digging into compiler internals. icc and clang have triaging flags (see https://llvm.org/docs/OptBisect.html for clang implementation of this mechanism). It would be very helpful to have similar functionality in gcc. If I'm missing something, I would appreciate pointers to documents describing how to triage this kind of bugs in gcc.
Comment 1 Jakub Jelinek 2017-12-02 11:19:24 UTC
Option bisection is rarely useful for GCC, debugging issues with -Ox -fthat -fno-this -fwhatever is usually not beneficial over just -Ox or whatever minimal options you need it.  Bisecting to what GCC version introduced the bug or changed behavior is usually much better.  Unfortunately my sde version doesn't like this:
SDE ERROR: Cannot execute XGETBV with ECX != 0.

 at (no-file):65 Function (no-func)
and I only have Haswell-E hw, so can't easily bisect this myself (at least not until I upgrade sde).
Comment 2 Jakub Jelinek 2017-12-02 11:21:01 UTC
BTW, better avoid any headers if possible, so if:
long long int m33 = 8707493562598231894LL;
long long int m46 = 2720142332956971588LL;
long long int m30 = 5337614525613944604LL;
unsigned int m21 = 1092738485U;

long long int a43 = -2228108721620697360LL;
unsigned int a16 = 3060393125U;
long long int a103 = -5267148545474741934LL;
long long int a23 = 628644587444781171LL;
long long int a120 = -1929881923114969467LL;
unsigned int a31 = 342358347U;
unsigned int a50 = 4141428744U;
unsigned int a92 = 3147872734U;
long long int a20 = 2249711228974996732LL;
unsigned int a112 = 4012608111U;
unsigned int a113 = 664122423U;
unsigned int a55 = 795984700U;

unsigned int v38 = 751359462U;
unsigned int *p8 = &(v38);

unsigned int v64 = 274677517U;
unsigned long long int v36 = 14738459288714673932ULL;
unsigned int *p9 = &(v64);
unsigned int *p10 = &(a113);
unsigned long long int * p6 = &(v36);

long long int tt;
unsigned int yy;
long long int *p11 = &tt;
unsigned int *p12 = &yy;

unsigned long long int v12 = 5759377091529791657ULL;
unsigned long long int v146 = 15085582420970487994ULL;
unsigned long long int v176 = 13537462614340337437ULL;
unsigned int v114 = 3159284560U;
long long int v84 = -5592336281551563373LL;
unsigned int v44 = 916868838U;

unsigned long long int tf_0_var_108 = 122846687590239390ULL;
unsigned long long int tf_0_var_118 = 15084731736992858763ULL;
unsigned int tf_0_var_546 = 1383085329U;
long long int tf_0_var_142 = 676559977929482050LL;
unsigned int tf_0_var_614 = 1321771489U;
unsigned int tf_0_var_116 = 2438389883U;
long long int tf_0_var_682 = 2997174617692616057LL;
unsigned long long int tf_0_var_728 = 4431338120255382076ULL;
unsigned long long int tf_0_var_120 = 11272010769831539270ULL;
unsigned int tf_0_var_954 = 2906827848U;

void foo() {
  if (-2783342978U * int(a43) || v64)
    if (p9)
      if (m33)
        if (v36)
          if (v12 & ~-(8 ? -2783342978U * int(a43) : 0)) {
            m46 = a16 < a103;
            tf_0_var_118 = a23 >> *p8 - 751359400;
            *p10 = v146;
            long a =
                (a23 >> *p8 - 751359400 >>
                 ~-(8 ? -2783342978U * int(a43) : 0) - 88480234) -
                        (808 ? 8 ? -2783342978U * int(a43) : 0 : 0)
                    ? v176
                    : ~-(8 ? -2783342978U * int(a43) : 0) -
                          88480234;
            tf_0_var_108 = a;
            if (~0 % *p6 % 5)
              tf_0_var_546 =
                  -3 * ((8 ? a43 : 0) - 4 ?: 407228174574);
            if (v114 < (0 || ~0)) {
              long long *b = &tf_0_var_142;
              p10 = 0;
              int c(*p9);
              *p11 = 0;
              tf_0_var_614 =
                  ~(808 ? -(8 ? unsigned(-2783342978U * a43) : 0)
                        : 0);
              a120 = *b & m30;
              tf_0_var_116 = c;
            } else {
              m30 = 0;
              int d, e(!0 % (a31 % *p6));
              a50 =
                  ((a92 || !m21) &&
                   a20) -
                  -(8 ? -2783342978U * int(a43) : d);
              tf_0_var_682 = a23 >> *p8 - 751359400;
              tf_0_var_142 = v12;
              long f(a23 >> *p8 - 751359400 >>
                     ~-(8 ? -2783342978U * int(a43) : 0) -
                         88480234);
              tf_0_var_728 = v36;
              a112 = a103 * f * e * v84;
            }
            if (8ULL *
                -(808 ? -(8 ? -2783342978U * int(a43) : 0) : 0))
              ;
            else {
              *p10 = 0;
              int g(3 & v44);
              tf_0_var_120 = unsigned(~a23 + 9223372036854775807 >>
                                      (8 ? a43 : 0));
              *p12 = g + tf_0_var_142;
              a113 =
                  m30 ||
                  ~0 + 9223372036854775807 >>
                      ~-(8 ? -2783342978U * int(a43) : 0);
              tf_0_var_954 = a23 >> (8 ? 8 * a43 : 0);
              a55 = *p8 ? -2783342978U * a43 : 0;
            }
          }
}

int main() {
  foo();
  __builtin_printf ("%d\n", a50);
  return 0;
}

does reproduce it too, better test that.
Comment 3 Jakub Jelinek 2017-12-03 11:18:26 UTC
With newer SDE bisected fix to r255258.
I'll commit the testcase and mark as fixed.
Comment 4 Jakub Jelinek 2017-12-03 11:48:42 UTC
Started r249450, was likely latent before that.
Slightly adjusted testcase (needs C++ though, for some reason with C it doesn't FAIL).  The difference between r255257 and r255258 is:
--- pr83252.s.r255257	2017-12-03 06:47:40.000000000 -0500
+++ pr83252.s.r255258	2017-12-03 06:47:52.000000000 -0500
@@ -114,7 +114,8 @@ _Z3foov:
 	testl	%ecx, %ecx
 	jne	.L12
 .L11:
-	imull	$1511624318, %esi, %eax
+	movl	-28(%rsp), %eax
+	cmpq	$1, t(%rip)
 	sbbl	$-1, %eax
 .L12:
 	movl	-8(%rsp), %edx

// PR target/83252
// { dg-do run { target lp64 } }
// { dg-options "-O3" }
// { dg-additional-options "-mbmi2 -mtune=intel" { target bmi2 } }

long long int h = 8707493562598231894LL;
long long int i = 2720142332956971588LL;
long long int j = 5337614525613944604LL;
unsigned int k = 1092738485U;
long long int l = -2228108721620697360LL;
unsigned int m = 3060393125U;
long long int n = -5267148545474741934LL;
long long int o = 628644587444781171LL;
long long int p = -1929881923114969467LL;
unsigned int q = 342358347U;
unsigned int r = 4141428744U;
unsigned int s = 3147872734U;
long long int t = 2249711228974996732LL;
unsigned int u = 4012608111U;
unsigned int v = 664122423U;
unsigned int w = 795984700U;
unsigned int x = 751359462U;
unsigned int *y = &x;
unsigned int z = 274677517U;
unsigned long long int z1 = 14738459288714673932ULL;
unsigned int *z2 = &z;
unsigned int *z3 = &v;
unsigned long long int *z4 = &z1;
long long int z5;
unsigned int z6;
long long int *z7 = &z5;
unsigned int *z8 = &z6;
unsigned long long int z9 = 5759377091529791657ULL;
unsigned long long int x7 = 15085582420970487994ULL;
unsigned long long int y2 = 13537462614340337437ULL;
unsigned int y3 = 3159284560U;
long long int y4 = -5592336281551563373LL;
unsigned int y5 = 916868838U;
unsigned long long int y6 = 122846687590239390ULL;
unsigned long long int y7 = 15084731736992858763ULL;
unsigned int y8 = 1383085329U;
long long int y9 = 676559977929482050LL;
unsigned int x1 = 1321771489U;
unsigned int x2 = 2438389883U;
long long int x3 = 2997174617692616057LL;
unsigned long long int x4 = 4431338120255382076ULL;
unsigned long long int x5 = 11272010769831539270ULL;
unsigned int x6 = 2906827848U;

void
foo (void)
{
  if ((-2783342978U * (int) l || z) && z2 && h && z1 && (z9 & ~-(8 ? -2783342978U * (int) l : 0)))
    {
      i = m < n;
      y7 = o >> *y - 751359400;
      *z3 = x7;
      long a = (o >> *y - 751359400 >> ~-(8 ? -2783342978U * (int) l : 0) - 88480234)
	       - (808 ? 8 ? -2783342978U * (int) l : 0 : 0) ? y2 : ~-(8 ? -2783342978U * (int) l : 0) - 88480234;
      y6 = a;
      if (~0 % *z4 % 5)
	y8 = -3 * ((8 ? l : 0) - 4 ? : 407228174574);
      if (y3 < (0 || ~0))
	{
	  long long *b = &y9;
	  z3 = 0;
	  int c = *z2;
	  *z7 = 0;
	  x1 = ~(808 ? -(8 ? (unsigned) (-2783342978U * l) : 0) : 0);
	  p = *b & j;
	  x2 = c;
	}
      else
	{
	  j = 0;
	  int d, e = !0 % (q % *z4);
	  r = ((s || !k) && t) - -(8 ? -2783342978U * (int) l : d);
	  x3 = o >> *y - 751359400;
	  y9 = z9;
	  long f = o >> *y - 751359400 >> ~-(8 ? -2783342978U * (int) l : 0) - 88480234;
	  x4 = z1;
	  u = n * f * e * y4;
	}
      if (8ULL * -(808 ? -(8 ? -2783342978U * (int) l : 0) : 0))
	;
      else
	{
	  *z3 = 0;
	  int g = 3 & y5;
	  x5 = (unsigned) (~o + 9223372036854775807 >> (8 ? l : 0));
	  *z8 = g + y9;
	  v = j || ~0 + 9223372036854775807 >> ~-(8 ? -2783342978U * (int) l : 0);
	  x6 = o >> (8 ? 8 * l : 0);
	  w = *y ? -2783342978U * l : 0;
	}
    }
}

int
main ()
{
  foo ();
  if (r != 88480289)
    __builtin_abort ();
  return 0;
}
Comment 5 Dmitry Babokin 2017-12-03 23:10:20 UTC
The original test case is also fixed. Thanks for investigation.
Comment 6 Dmitry Babokin 2017-12-03 23:26:24 UTC
(In reply to Jakub Jelinek from comment #1)
> Option bisection is rarely useful for GCC, debugging issues with -Ox -fthat
> -fno-this -fwhatever is usually not beneficial over just -Ox or whatever
> minimal options you need it.  Bisecting to what GCC version introduced the
> bug or changed behavior is usually much better.

Bisecting optimizations is orthogonal to bisecting of revision the bug started with. In my experience it's extremely useful (with icc and clang) for understanding the root cause of the bug (just compare two assemblers or IRs with and without the guilty optimization). It's also very useful for distinguishing bugs in automatic way - in case of multiple failing test cases I can sort them in different buckets corresponding to guilty optmizations. This analysis is not 100% accurate, but is very useful. And the third useful feature of such bisection - finding an optimization for initial bug assignment.

I'm not insisting on such functionality, as my bugs were always analysed and fixed in timely manner (unlike with other compilers). But I still think gcc would benefit from such mechanism a lot.
Comment 7 Andrew Pinski 2017-12-04 07:01:41 UTC
(In reply to Dmitry Babokin from comment #6)
> I'm not insisting on such functionality, as my bugs were always analysed and
> fixed in timely manner (unlike with other compilers). But I still think gcc
> would benefit from such mechanism a lot.

The problem with GCC (and sometimes other compilers) is latent bugs in passes after where the change happened.  A lot of the time for GCC, this has been true.  Especially when we are talking about GCC's RTL passes; some of which date from the mid to late 1990s (e.g. combine).  And even some of the RTL passes don't have an option flag to enable/disable (e.g. combine).

Also there are other things like disabling the late VRP or another pass which uses the VRP data that was not updated which causes the bug to show up.

Just a few examples of why this functionality is not very useful in GCC and also it gets abused by some users of it to workaround a bug in one specific version of GCC rather than reporting the bug :).
Comment 8 Richard Biener 2017-12-04 09:28:40 UTC
(In reply to Andrew Pinski from comment #7)
> (In reply to Dmitry Babokin from comment #6)
> > I'm not insisting on such functionality, as my bugs were always analysed and
> > fixed in timely manner (unlike with other compilers). But I still think gcc
> > would benefit from such mechanism a lot.
> 
> The problem with GCC (and sometimes other compilers) is latent bugs in
> passes after where the change happened.  A lot of the time for GCC, this has
> been true.  Especially when we are talking about GCC's RTL passes; some of
> which date from the mid to late 1990s (e.g. combine).  And even some of the
> RTL passes don't have an option flag to enable/disable (e.g. combine).
> 
> Also there are other things like disabling the late VRP or another pass
> which uses the VRP data that was not updated which causes the bug to show up.
> 
> Just a few examples of why this functionality is not very useful in GCC and
> also it gets abused by some users of it to workaround a bug in one specific
> version of GCC rather than reporting the bug :).

I suppose one could try scripting something with -fdisable-{tree,rtl}-$dump
and seeding the list of passes to enable/disable with -fdump-{tree,rtl}-all.

Of course some -fdisable-* are "invalid" and will cause "interesting" downstream
effects...

Sometimes I do this manually for cases where it isn't obvious who's doing sth
wrong...
Comment 9 Dmitry Babokin 2017-12-04 18:25:41 UTC
(In reply to Richard Biener from comment #8)
> I suppose one could try scripting something with -fdisable-{tree,rtl}-$dump
> and seeding the list of passes to enable/disable with -fdump-{tree,rtl}-all.
> 
> Of course some -fdisable-* are "invalid" and will cause "interesting"
> downstream
> effects...
> 
> Sometimes I do this manually for cases where it isn't obvious who's doing sth
> wrong...

The main downside of this approach is that it requires understanding of GCC compiler internals. And GCC bug tracker doesn't even have generic "new bugs" component to assign, which means people need to have secret knowledge about compiler internals. Would be good to address this problem in the long run and make bug filing process more friendly for outsiders.

Don't take me wrong, my bugs are almost always are addressed blazingly fast and I'm super happy about it. But I still think there are things which can make life for bug filing people easier.
Comment 10 Andrew Pinski 2017-12-04 19:19:20 UTC
(In reply to Dmitry Babokin from comment #9)
> And GCC bug tracker doesn't even have generic "new bugs"
> component to assign, which means people need to have secret knowledge about
> compiler internals

We can add this if needed.  I think regression could be made make generic and add a generic new bug component.  We do have some very active people reading bug reports that can re-categorize them (not to sound special but I am one of them; note sometimes I get it wrong too).
Comment 11 Dmitry Babokin 2017-12-04 19:31:59 UTC
(In reply to Andrew Pinski from comment #10)

> We can add this if needed.  I think regression could be made make generic
> and add a generic new bug component.  We do have some very active people
> reading bug reports that can re-categorize them (not to sound special but I
> am one of them; note sometimes I get it wrong too).

Yes, in practice bugs are usually dispatched to the correct owner quite fast (thanks for that, it really helps). So "new" category would not change much in this sense, but it would definitely make the process more friendly and less confusing for outsiders.
Comment 12 Jakub Jelinek 2017-12-05 12:51:26 UTC
This broke again with r255377.
Testcase in patch form at https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00133.html
Comment 13 Vladimir Makarov 2017-12-06 19:15:51 UTC
(In reply to Jakub Jelinek from comment #12)
> This broke again with r255377.
> Testcase in patch form at
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00133.html

I've started to work on it.  In any case the patch will need a lot of testing.  I hope to have a fix on this week or at the start of next week.
Comment 14 Vladimir Makarov 2017-12-07 17:51:26 UTC
Author: vmakarov
Date: Thu Dec  7 17:50:54 2017
New Revision: 255471

URL: https://gcc.gnu.org/viewcvs?rev=255471&root=gcc&view=rev
Log:
2017-12-07  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/83252
	PR rtl-optimization/80818
	* lra.c (add_regs_to_insn_regno_info): Make a hard reg in CLOBBER
	always early clobbered.
	* lra-lives.c (process_bb_lives): Check input hard regs for early
	clobbered non-operand hard reg.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-lives.c
    trunk/gcc/lra.c
Comment 15 Jeffrey A. Law 2017-12-07 19:52:31 UTC
Fixed by Vlad's change on the trunk.  Hopefully for good this time :-)
Comment 16 Jakub Jelinek 2017-12-08 00:03:59 UTC
Author: jakub
Date: Fri Dec  8 00:03:28 2017
New Revision: 255487

URL: https://gcc.gnu.org/viewcvs?rev=255487&root=gcc&view=rev
Log:
	PR target/83252
	* gcc.target/i386/i386.exp (check_effective_target_bmi2): Moved to ...
	* lib/target-supports.exp (check_effective_target_bmi2): ... here.  Guard with
	i?86-*-* x86_64-*-*.
	* g++.dg/opt/pr83252.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/opt/pr83252.C
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/i386.exp
    trunk/gcc/testsuite/lib/target-supports.exp