Summary: | [7 Regression] Spills %xmm to stack in glibc strspn SSE 4.2 variant | ||
---|---|---|---|
Product: | gcc | Reporter: | Richard Biener <rguenth> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | amonakov, aurelien, fw, jakub, vmakarov |
Priority: | P2 | Keywords: | missed-optimization, ra |
Version: | 8.0 | ||
Target Milestone: | 7.3 | ||
Host: | Target: | i?86-*-* | |
Build: | Known to work: | 4.8.5, 6.4.0 | |
Known to fail: | 7.1.0, 7.1.1, 8.0 | Last reconfirmed: | 2017-10-11 00:00:00 |
Attachments: | reduced testcase |
Unfortunately neither -mincoming-stack-boundary nor -mstackrealign are valid in target attributes (to annotate affected functions only). I'm too dumb to understand the LRA dump ;) A bit further reduced, needs -m32 -mssse3 -fpic -O2: #include <immintrin.h> extern const signed char c[31] __attribute__((visibility("hidden"))); __m128i f(__m128i *x, void *v) { int i; asm("# %0" : "=r"(i)); __m128i t = _mm_loadu_si128((void*)&c[i]); __m128i xx = *x; xx = _mm_shuffle_epi8(xx, t); asm("# %0 %1 %2" : "+x"(xx) : "r"(c), "r"(i)); return xx; } In IRA we have (insn 9 8 24 2 (set (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ]) (mem:V2DI (plus:SI (plus:SI (reg:SI 99 [ i ]) (reg:SI 87)) (const:SI (unspec:SI [ (symbol_ref:SI ("c") [flags 0x42] <var_decl 0x7f94b4253480 c>) ] UNSPEC_GOTOFF))) [0 MEM[(const __m128i_u * {ref-all})_1]+0 S16 A8])) "./include/emmintrin.h":702 1233 {movv2di_internal} (expr_list:REG_EQUIV (mem:V2DI (plus:SI (plus:SI (reg:SI 99 [ i ]) (reg:SI 87)) (const:SI (unspec:SI [ (symbol_ref:SI ("c") [flags 0x42] <var_decl 0x7f94b4253480 c>) ] UNSPEC_GOTOFF))) [0 MEM[(const __m128i_u * {ref-all})_1]+0 S16 A8]) (nil))) (insn 24 9 10 2 (set (reg/v/f:SI 97 [ x ]) (mem/f/c:SI (reg/f:SI 16 argp) [2 x+0 S4 A32])) "./include/tmmintrin.h":138 82 {*movsi_internal} (expr_list:REG_EQUIV (mem/f/c:SI (reg/f:SI 16 argp) [2 x+0 S4 A32]) (nil))) ... (insn 11 10 14 2 (set (reg:V16QI 101) (unspec:V16QI [ (reg:V16QI 102 [ *x_5(D) ]) (subreg:V16QI (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ]) 0) ] UNSPEC_PSHUFB)) "./include/tmmintrin.h":138 3798 {ssse3_pshufbv16qi3} (expr_list:REG_DEAD (reg:V16QI 102 [ *x_5(D) ]) (expr_list:REG_DEAD (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ]) (nil)))) Pseudo 100 gets NO_REGS class in ira-costs.c a6 (r100,l0) best NO_REGS, allocno NO_REGS a6(r100,l0) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 MEM:-9000,-9000 because it is supposed in ira-costs.c that we can remove insn 9 by using equiv memory for pseudo 100. LRA does not use this equivalence because the related code in IRA and LRA is not fully synced. If the patch will be not ready this week, then it will be ready only in a week. Author: vmakarov Date: Fri Sep 29 17:39:58 2017 New Revision: 253300 URL: https://gcc.gnu.org/viewcvs?rev=253300&root=gcc&view=rev Log: 2017-09-29 Vladimir Makarov <vmakarov@redhat.com> PR target/81481 * ira-costs.c (scan_one_insn): Don't take into account PIC equiv with a symbol for LRA. 2017-09-29 Vladimir Makarov <vmakarov@redhat.com> PR target/81481 * gcc.target/i386/pr81481.c: New. Added: trunk/gcc/testsuite/gcc.target/i386/pr81481.c Modified: trunk/gcc/ChangeLog trunk/gcc/ira-costs.c trunk/gcc/testsuite/ChangeLog Fixed on the trunk. Patch has been applied to our GCC 7 with no issues sofar (but fixing the reported issues). (In reply to Richard Biener from comment #7) > Patch has been applied to our GCC 7 with no issues sofar (but fixing the > reported issues). Given all is fine with this patch, do you think it can be backported to the GCC 7 branch? On January 15, 2018 5:34:58 PM GMT+01:00, aurelien at aurel32 dot net <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81481 > >--- Comment #8 from Aurelien Jarno <aurelien at aurel32 dot net> --- >(In reply to Richard Biener from comment #7) >> Patch has been applied to our GCC 7 with no issues sofar (but fixing >the >> reported issues). > >Given all is fine with this patch, do you think it can be backported to >the GCC >7 branch? We're using it at suse so I think yes. Author: rguenth Date: Tue Jan 16 09:51:57 2018 New Revision: 256731 URL: https://gcc.gnu.org/viewcvs?rev=256731&root=gcc&view=rev Log: 2018-01-16 Richard Biener <rguenther@suse.de> Backport from mainline 2017-09-29 Vladimir Makarov <vmakarov@redhat.com> PR target/81481 * ira-costs.c (scan_one_insn): Don't take into account PIC equiv with a symbol for LRA. * gcc.target/i386/pr81481.c: New. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr81481.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/ira-costs.c branches/gcc-7-branch/gcc/testsuite/ChangeLog Fixed. (In reply to Richard Biener from comment #11) > Fixed. Thanks! |
Created attachment 41787 [details] reduced testcase We're seeing crashes in 32bit programs violating the ABI by not properly aligning the outgoing stack when calling strspn which eventually dispatches to __strspn_sse42. This is because GCC 7 and trunk compile this to .L28: .cfi_restore_state movdqu ___m128i_shift_right@GOTOFF(%edi,%ebx), %xmm3 movl %esi, %ebp andl $-16, %ebp movl $16, %eax movaps %xmm3, (%esp) movdqa 0(%ebp), %xmm0 pshufb (%esp), %xmm0 pcmpistri $58, %xmm0, %xmm0 spilling %xmm3 for no good reason. GCC 4.8 at least did better here and avoided spilling (and the crashes). There are other string routines similarly affected. movdqu ___m128i_shift_right@GOTOFF(%ebx,%eax), %xmm0 andl $-16, %ebp movdqa 0(%ebp), %xmm1 pshufb %xmm0, %xmm1 pcmpistri $58, %xmm1, %xmm1 Reduced testcase attached, compile with -O2 -m32 -fPIC.