Created attachment 52119 [details] test-case It's follow up of what I started tracking in PR92860. $ g++ /tmp/i386-expand.ii -march=bdver1 -O3 -c -Wno-narrowing is miscompiled so that the following test-case crashes when run with the miscompiled compiler: $ $ cat options-save2.ii char flags[16]; int one = 1, two = 2; void __attribute__ ((noipa)) save() { flags[0] = one; flags[1] = one; flags[2] = one; flags[3] = one; flags[4] = one; flags[5] = one; flags[6] = one; flags[7] = one; flags[8] = one; flags[9] = one; flags[10] = one; flags[11] = one; flags[12] = one; flags[13] = one; flags[14] = one; flags[15] = two; } int main() { save (); __builtin_printf ("flags[0]=%d, flags[15]=%d\n", flags[0], flags[15]); if (flags[15] != 2) __builtin_abort (); return 0; } One needs here -O2 -march=core2.
Created attachment 52120 [details] Isolated test-case Isolated test-case where only the miscompiled function ix86_expand_vec_extract_even_odd uses -O3. @Uros: Can you please compare -fdump-tree-optimized before and after the revision?
The referred patch adds: +;; Pack/unpack vector modes +(define_mode_attr mmxpackmode + [(V4HI "V8QI") (V2SI "V4HI")]) + +(define_expand "vec_pack_trunc_<mode>" + [(match_operand:<mmxpackmode> 0 "register_operand") + (match_operand:MMXMODE24 1 "register_operand") + (match_operand:MMXMODE24 2 "register_operand")] + "TARGET_MMX_WITH_SSE" +{ + rtx op1 = gen_lowpart (<mmxpackmode>mode, operands[1]); + rtx op2 = gen_lowpart (<mmxpackmode>mode, operands[2]); + ix86_expand_vec_extract_even_odd (operands[0], op1, op2, 0); + DONE; +}) Can you confirm by disabling the above expander (add "0 &&" before TARGET_MMX_WITH_SSE) that this is the real problem? Also, this expander can be split in order to figure out which mode is the problematic one. Other than that, can you debug if there is a difference in what ix86_expand_vec_extract_even_odd passes to expand_vec_perm_1 and expand_vec_perm_even_odd_1 when compiled with -O2 vs -O3?
(In reply to Martin Liška from comment #1) > Created attachment 52120 [details] > Isolated test-case > > Isolated test-case where only the miscompiled function > ix86_expand_vec_extract_even_odd uses -O3. > > @Uros: Can you please compare -fdump-tree-optimized before and after the > revision? When compiled with -O2 -march=bdver1, there are indeed a bunch of suspicious XOP vpperm instructions in the function: vmovd 12(%rsp), %xmm6 # 303 [c=9 l=6] *movsi_internal/10 vpperm %xmm3, %xmm0, %xmm1, %xmm0 # 124 [c=4 l=5] mmx_ppermv64 vpaddd %xmm4, %xmm1, %xmm1 # 129 [c=8 l=4] *mmx_addv2si3/2 vpperm %xmm3, %xmm1, %xmm2, %xmm1 # 131 [c=4 l=5] mmx_ppermv64 vpperm .LC165(%rip), %xmm1, %xmm0, %xmm0 # 134 [c=13 l=9] mmx_ppermv64 vpaddb %xmm0, %xmm0, %xmm0 # 137 [c=8 l=4] *mmx_addv8qi3/2 vpshuflw $0, %xmm6, %xmm1 # 140 [c=8 l=5] *vec_dupv4hi/1 vpaddb %xmm1, %xmm0, %xmm0 # 142 [c=8 l=4] *mmx_addv8qi3/2 vmovq %xmm0, 32(%rsp,%rdi) # 143 [c=4 l=6] *movv8qi_internal/14 je .L4198 # 150 [c=12 l=2] *jcc I was not able to test them on my target, so I bet these are the problem.
Created attachment 52123 [details] Patch that disables XOP permute with partial vectors Please try this patch.
From what I can see, ifcvt dump is the same between r12-1788 and r12-1789, vect has quite a few changes in that function, but the function is fairly simple in ifcvt, because almost nothing is inlined into it, so there is just the single for (i = 0; i < nelt; ++i) d.perm[i] = i * 2 + odd; loop in that function. I see similar changes on reduced: struct S { unsigned char perm[64]; }; void bar (struct S *); void foo (unsigned int nelt, unsigned odd) { struct S d; unsigned i; for (i = 0; i < nelt; ++i) d.perm[i] = i * 2 + odd; bar (&d); } Do you know what nelt and odd values are used during the misbehaving *_vec_even_odd on your testcase? If so, perhaps we could make bar noipa and add perm verification in there, make foo also noipa and just call it from main with those arguments...
@Jakub: It looks the problem is in expand_vec_perm_pshufb, where permutation vector is recalculated for partial vectors: if (vmode == V4QImode || vmode == V8QImode) { rtx m128 = GEN_INT (-128); /* Remap elements from the second operand, as we have to account for inactive top elements from the first operand. */ if (!d->one_operand_p) { int sz = GET_MODE_SIZE (vmode); for (i = 0; i < nelt; ++i) { int ival = INTVAL (rperm[i]); if (ival >= sz) ival += 16-sz; rperm[i] = GEN_INT (ival); } } /* V4QI/V8QI is emulated with V16QI instruction, fill inactive elements in the top positions with zeros. */ for (i = nelt; i < 16; ++i) rperm[i] = m128; vpmode = V16QImode; } I must admit I only eyeballed the generated code, so perhaps there lies the dragon.
> > Do you know what nelt and odd values are used during the misbehaving > *_vec_even_odd on your testcase? If so, perhaps we could make bar noipa and > add perm verification in there, make foo also noipa and just call it from > main with those arguments... Thanks Jakub for the hint. Apparently, I was quite close to the testcase, but one needs nelt=8 and 64 elements of the perm array. So there's simplified test-case: $ cat xop.C int N = 8; char perm[64]; void __attribute__((noipa)) check (void) { for (int i = 0; i < N; ++i) __builtin_printf("perm[%d]=%d\n", i, perm[i]); if (perm[7] != 7) __builtin_abort (); } int main() { for (int i = 0; i < N; ++i) perm[i] = i; check (); return 0; } $ gcc xop.C -march=bdver1 -O3 $ ./a.out perm[0]=0 perm[1]=1 perm[2]=0 perm[3]=0 perm[4]=4 perm[5]=5 perm[6]=0 perm[7]=0 Aborted (core dumped)
Created attachment 52127 [details] Proposed patch This patch should fix the failure. @Martin: Can you please test this patch?
(In reply to Uroš Bizjak from comment #8) > Created attachment 52127 [details] > Proposed patch > > This patch should fix the failure. > > @Martin: Can you please test this patch? The patch fixes the test-case, thanks!
Can we include some testcase into the testsuite?
Created attachment 52128 [details] Testcase for the testsuite @Martin: Can you please test attached testcase?
(In reply to Uroš Bizjak from comment #11) > Created attachment 52128 [details] > Testcase for the testsuite > > @Martin: Can you please test attached testcase? The test fails w/o the patch and works with the patch candidate!
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:877c9e332f9b2b6eacd6ed4edf5790ee0f41a68f commit r12-6269-g877c9e332f9b2b6eacd6ed4edf5790ee0f41a68f Author: Uros Bizjak <ubizjak@gmail.com> Date: Wed Jan 5 20:06:03 2022 +0100 i386: Fix expand_vec_perm_pshufb for narrow modes [PR103905] 2022-01-05 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/103905 * config/i386/i386-expand.c (expand_vec_perm_pshufb): Fix number of narrow mode remapped elements for !one_operand_p case. gcc/testsuite/ChangeLog: PR target/103905 * gcc.target/i386/pr103905.c: New test.
Fixed.
*** Bug 103928 has been marked as a duplicate of this bug. ***