Bug 103905 - [12 Regression] Miscompiled i386-expand.c with -march=bdver1 and -O3 since r12-1789-g836328b2c99f5b8d
Summary: [12 Regression] Miscompiled i386-expand.c with -march=bdver1 and -O3 since r1...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Uroš Bizjak
URL:
Keywords: wrong-code
: 103928 (view as bug list)
Depends on:
Blocks: 92860
  Show dependency treegraph
 
Reported: 2022-01-04 19:29 UTC by Martin Liška
Modified: 2022-01-06 21:26 UTC (History)
4 users (show)

See Also:
Host:
Target: x86
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-01-05 00:00:00


Attachments
test-case (513.88 KB, application/zstd)
2022-01-04 19:29 UTC, Martin Liška
Details
Isolated test-case (514.26 KB, application/zstd)
2022-01-04 19:56 UTC, Martin Liška
Details
Patch that disables XOP permute with partial vectors (275 bytes, text/plain)
2022-01-04 21:14 UTC, Uroš Bizjak
Details
Proposed patch (566 bytes, patch)
2022-01-05 08:57 UTC, Uroš Bizjak
Details | Diff
Testcase for the testsuite (241 bytes, text/plain)
2022-01-05 09:29 UTC, Uroš Bizjak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2022-01-04 19:29:36 UTC
Created attachment 52119 [details]
test-case

It's follow up of what I started tracking in PR92860.

$ g++ /tmp/i386-expand.ii -march=bdver1 -O3 -c -Wno-narrowing

is miscompiled so that the following test-case crashes when run with the miscompiled compiler:

$ $ cat options-save2.ii
char flags[16];

int one = 1, two = 2;

void
__attribute__ ((noipa))
save() {
  flags[0] = one;
  flags[1] = one;
  flags[2] = one;
  flags[3] = one;
  flags[4] = one;
  flags[5] = one;
  flags[6] = one;
  flags[7] = one;
  flags[8] = one;
  flags[9] = one;
  flags[10] = one;
  flags[11] = one;
  flags[12] = one;
  flags[13] = one;
  flags[14] = one;
  flags[15] = two;
}

int
main()
{
  save ();

  __builtin_printf ("flags[0]=%d, flags[15]=%d\n", flags[0], flags[15]);
  if (flags[15] != 2)
    __builtin_abort ();

  return 0;
}

One needs here -O2 -march=core2.
Comment 1 Martin Liška 2022-01-04 19:56:52 UTC
Created attachment 52120 [details]
Isolated test-case

Isolated test-case where only the miscompiled function ix86_expand_vec_extract_even_odd uses -O3.

@Uros: Can you please compare -fdump-tree-optimized before and after the revision?
Comment 2 Uroš Bizjak 2022-01-04 20:12:43 UTC
The referred patch adds:

+;; Pack/unpack vector modes
+(define_mode_attr mmxpackmode
+  [(V4HI "V8QI") (V2SI "V4HI")])
+
+(define_expand "vec_pack_trunc_<mode>"
+  [(match_operand:<mmxpackmode> 0 "register_operand")
+   (match_operand:MMXMODE24 1 "register_operand")
+   (match_operand:MMXMODE24 2 "register_operand")]
+  "TARGET_MMX_WITH_SSE"
+{
+  rtx op1 = gen_lowpart (<mmxpackmode>mode, operands[1]);
+  rtx op2 = gen_lowpart (<mmxpackmode>mode, operands[2]);
+  ix86_expand_vec_extract_even_odd (operands[0], op1, op2, 0);
+  DONE;
+})

Can you confirm by disabling the above expander (add "0 &&" before TARGET_MMX_WITH_SSE) that this is the real problem? Also, this expander can be split in order to figure out which mode is the problematic one.

Other than that, can you debug if there is a difference in what ix86_expand_vec_extract_even_odd passes to expand_vec_perm_1 and expand_vec_perm_even_odd_1 when compiled with -O2 vs -O3?
Comment 3 Uroš Bizjak 2022-01-04 21:03:44 UTC
(In reply to Martin Liška from comment #1)
> Created attachment 52120 [details]
> Isolated test-case
> 
> Isolated test-case where only the miscompiled function
> ix86_expand_vec_extract_even_odd uses -O3.
> 
> @Uros: Can you please compare -fdump-tree-optimized before and after the
> revision?

When compiled with -O2 -march=bdver1, there are indeed a bunch of suspicious XOP vpperm instructions in the function:

	vmovd	12(%rsp), %xmm6	# 303	[c=9 l=6]  *movsi_internal/10
	vpperm	%xmm3, %xmm0, %xmm1, %xmm0	# 124	[c=4 l=5]  mmx_ppermv64
	vpaddd	%xmm4, %xmm1, %xmm1	# 129	[c=8 l=4]  *mmx_addv2si3/2
	vpperm	%xmm3, %xmm1, %xmm2, %xmm1	# 131	[c=4 l=5]  mmx_ppermv64
	vpperm	.LC165(%rip), %xmm1, %xmm0, %xmm0	# 134	[c=13 l=9]  mmx_ppermv64
	vpaddb	%xmm0, %xmm0, %xmm0	# 137	[c=8 l=4]  *mmx_addv8qi3/2
	vpshuflw	$0, %xmm6, %xmm1	# 140	[c=8 l=5]  *vec_dupv4hi/1
	vpaddb	%xmm1, %xmm0, %xmm0	# 142	[c=8 l=4]  *mmx_addv8qi3/2
	vmovq	%xmm0, 32(%rsp,%rdi)	# 143	[c=4 l=6]  *movv8qi_internal/14
	je	.L4198	# 150	[c=12 l=2]  *jcc

I was not able to test them on my target, so I bet these are the problem.
Comment 4 Uroš Bizjak 2022-01-04 21:14:03 UTC
Created attachment 52123 [details]
Patch that disables XOP permute with partial vectors

Please try this patch.
Comment 5 Jakub Jelinek 2022-01-04 21:31:13 UTC
From what I can see, ifcvt dump is the same between r12-1788 and r12-1789, vect has quite a few changes in that function, but the function is fairly simple in ifcvt, because almost nothing is inlined into it, so there is just the single
  for (i = 0; i < nelt; ++i)
    d.perm[i] = i * 2 + odd;
loop in that function.

I see similar changes on reduced:
struct S { unsigned char perm[64]; };

void bar (struct S *);

void
foo (unsigned int nelt, unsigned odd)
{
  struct S d;
  unsigned i;

  for (i = 0; i < nelt; ++i)
    d.perm[i] = i * 2 + odd;

  bar (&d);
}

Do you know what nelt and odd values are used during the misbehaving *_vec_even_odd on your testcase?  If so, perhaps we could make bar noipa and add perm verification in there, make foo also noipa and just call it from main with those arguments...
Comment 6 Uroš Bizjak 2022-01-04 21:44:07 UTC
@Jakub: It looks the problem is in expand_vec_perm_pshufb, where permutation vector is recalculated for partial vectors:

  if (vmode == V4QImode
      || vmode == V8QImode)
    {
      rtx m128 = GEN_INT (-128);

      /* Remap elements from the second operand, as we have to
	 account for inactive top elements from the first operand.  */
      if (!d->one_operand_p)
	{
	  int sz = GET_MODE_SIZE (vmode);

	  for (i = 0; i < nelt; ++i)
	    {
	      int ival = INTVAL (rperm[i]);
	      if (ival >= sz)
		ival += 16-sz;
	      rperm[i] = GEN_INT (ival);
	    }
	}

      /* V4QI/V8QI is emulated with V16QI instruction, fill inactive
	 elements in the top positions with zeros.  */
      for (i = nelt; i < 16; ++i)
	rperm[i] = m128;

      vpmode = V16QImode;
    }

I must admit I only eyeballed the generated code, so perhaps there lies the dragon.
Comment 7 Martin Liška 2022-01-05 05:43:40 UTC
> 
> Do you know what nelt and odd values are used during the misbehaving
> *_vec_even_odd on your testcase?  If so, perhaps we could make bar noipa and
> add perm verification in there, make foo also noipa and just call it from
> main with those arguments...

Thanks Jakub for the hint. Apparently, I was quite close to the testcase, but one needs nelt=8 and 64 elements of the perm array.

So there's simplified test-case:

$ cat xop.C
int N = 8;
char perm[64];

void
__attribute__((noipa))
check (void)
{
  for (int i = 0; i < N; ++i)
    __builtin_printf("perm[%d]=%d\n", i, perm[i]);

  if (perm[7] != 7)
    __builtin_abort ();
}

int main()
{
  for (int i = 0; i < N; ++i)
    perm[i] = i;

  check ();
  return 0;
}

$ gcc xop.C -march=bdver1 -O3
$ ./a.out
perm[0]=0
perm[1]=1
perm[2]=0
perm[3]=0
perm[4]=4
perm[5]=5
perm[6]=0
perm[7]=0
Aborted (core dumped)
Comment 8 Uroš Bizjak 2022-01-05 08:57:59 UTC
Created attachment 52127 [details]
Proposed patch

This patch should fix the failure.

@Martin: Can you please test this patch?
Comment 9 Martin Liška 2022-01-05 09:02:31 UTC
(In reply to Uroš Bizjak from comment #8)
> Created attachment 52127 [details]
> Proposed patch
> 
> This patch should fix the failure.
> 
> @Martin: Can you please test this patch?

The patch fixes the test-case, thanks!
Comment 10 Jakub Jelinek 2022-01-05 09:17:57 UTC
Can we include some testcase into the testsuite?
Comment 11 Uroš Bizjak 2022-01-05 09:29:18 UTC
Created attachment 52128 [details]
Testcase for the testsuite

@Martin: Can you please test attached testcase?
Comment 12 Martin Liška 2022-01-05 10:01:01 UTC
(In reply to Uroš Bizjak from comment #11)
> Created attachment 52128 [details]
> Testcase for the testsuite
> 
> @Martin: Can you please test attached testcase?

The test fails w/o the patch and works with the patch candidate!
Comment 13 GCC Commits 2022-01-05 19:07:32 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:877c9e332f9b2b6eacd6ed4edf5790ee0f41a68f

commit r12-6269-g877c9e332f9b2b6eacd6ed4edf5790ee0f41a68f
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Wed Jan 5 20:06:03 2022 +0100

    i386: Fix expand_vec_perm_pshufb for narrow modes [PR103905]
    
    2022-01-05  Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog:
    
            PR target/103905
            * config/i386/i386-expand.c (expand_vec_perm_pshufb): Fix number of
            narrow mode remapped elements for !one_operand_p case.
    
    gcc/testsuite/ChangeLog:
    
            PR target/103905
            * gcc.target/i386/pr103905.c: New test.
Comment 14 Uroš Bizjak 2022-01-05 19:14:55 UTC
Fixed.
Comment 15 Andrew Pinski 2022-01-06 21:26:30 UTC
*** Bug 103928 has been marked as a duplicate of this bug. ***