Bug 81481 - [7 Regression] Spills %xmm to stack in glibc strspn SSE 4.2 variant
Summary: [7 Regression] Spills %xmm to stack in glibc strspn SSE 4.2 variant
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P2 normal
Target Milestone: 7.3
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2017-07-19 09:13 UTC by Richard Biener
Modified: 2018-01-16 13:31 UTC (History)
5 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work: 4.8.5, 6.4.0
Known to fail: 7.1.0, 7.1.1, 8.0
Last reconfirmed: 2017-10-11 00:00:00


Attachments
reduced testcase (646 bytes, text/x-csrc)
2017-07-19 09:13 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2017-07-19 09:13:23 UTC
Created attachment 41787 [details]
reduced testcase

We're seeing crashes in 32bit programs violating the ABI by not properly aligning the outgoing stack when calling strspn which eventually dispatches to __strspn_sse42.  This is because GCC 7 and trunk compile this to

.L28:
        .cfi_restore_state
        movdqu  ___m128i_shift_right@GOTOFF(%edi,%ebx), %xmm3
        movl    %esi, %ebp
        andl    $-16, %ebp
        movl    $16, %eax
        movaps  %xmm3, (%esp)
        movdqa  0(%ebp), %xmm0
        pshufb  (%esp), %xmm0
        pcmpistri       $58, %xmm0, %xmm0

spilling %xmm3 for no good reason.  GCC 4.8 at least did better here and avoided
spilling (and the crashes).  There are other string routines similarly affected.

        movdqu  ___m128i_shift_right@GOTOFF(%ebx,%eax), %xmm0
        andl    $-16, %ebp
        movdqa  0(%ebp), %xmm1
        pshufb  %xmm0, %xmm1
        pcmpistri       $58, %xmm1, %xmm1

Reduced testcase attached, compile with -O2 -m32 -fPIC.
Comment 1 Richard Biener 2017-07-19 09:16:45 UTC
Unfortunately neither -mincoming-stack-boundary nor -mstackrealign are valid in target attributes (to annotate affected functions only).
Comment 2 Richard Biener 2017-07-19 11:08:04 UTC
I'm too dumb to understand the LRA dump ;)
Comment 3 Alexander Monakov 2017-09-26 11:33:27 UTC
A bit further reduced, needs -m32 -mssse3 -fpic -O2:

#include <immintrin.h>

extern const signed char c[31] __attribute__((visibility("hidden")));

__m128i f(__m128i *x, void *v)
{
  int i;
  asm("# %0" : "=r"(i));
  __m128i t = _mm_loadu_si128((void*)&c[i]);
  __m128i xx = *x;
  xx =  _mm_shuffle_epi8(xx, t);
  asm("# %0 %1 %2" : "+x"(xx) : "r"(c), "r"(i));
  return xx;
}
Comment 4 Vladimir Makarov 2017-09-28 17:59:59 UTC
In IRA we have

(insn 9 8 24 2 (set (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ])
        (mem:V2DI (plus:SI (plus:SI (reg:SI 99 [ i ])
                    (reg:SI 87))
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("c") [flags 0x42] <var_decl 0x7f94b4253480 c>)
                        ] UNSPEC_GOTOFF))) [0 MEM[(const __m128i_u * {ref-all})_1]+0 S16 A8])) "./include/emmintrin.h":702 1233 {movv2di_internal}
     (expr_list:REG_EQUIV (mem:V2DI (plus:SI (plus:SI (reg:SI 99 [ i ])
                    (reg:SI 87))
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("c") [flags 0x42] <var_decl 0x7f94b4253480 c>)
                        ] UNSPEC_GOTOFF))) [0 MEM[(const __m128i_u * {ref-all})_1]+0 S16 A8])
        (nil)))
(insn 24 9 10 2 (set (reg/v/f:SI 97 [ x ])
        (mem/f/c:SI (reg/f:SI 16 argp) [2 x+0 S4 A32])) "./include/tmmintrin.h":138 82 {*movsi_internal}
     (expr_list:REG_EQUIV (mem/f/c:SI (reg/f:SI 16 argp) [2 x+0 S4 A32])
        (nil)))
...
(insn 11 10 14 2 (set (reg:V16QI 101)
        (unspec:V16QI [
                (reg:V16QI 102 [ *x_5(D) ])
                (subreg:V16QI (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ]) 0)
            ] UNSPEC_PSHUFB)) "./include/tmmintrin.h":138 3798 {ssse3_pshufbv16qi3}
     (expr_list:REG_DEAD (reg:V16QI 102 [ *x_5(D) ])
        (expr_list:REG_DEAD (reg:V2DI 100 [ MEM[(const __m128i_u * {ref-all})_1] ])
            (nil))))

Pseudo 100 gets NO_REGS class in ira-costs.c

  a6 (r100,l0) best NO_REGS, allocno NO_REGS

  a6(r100,l0) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 MEM:-9000,-9000

because it is supposed in ira-costs.c that we can remove insn 9 by using equiv memory for pseudo 100.

LRA does not use this equivalence because the related code in IRA and LRA
is not fully synced.  If the patch will be not ready this week, then it will
be ready only in a week.
Comment 5 Vladimir Makarov 2017-09-29 17:40:29 UTC
Author: vmakarov
Date: Fri Sep 29 17:39:58 2017
New Revision: 253300

URL: https://gcc.gnu.org/viewcvs?rev=253300&root=gcc&view=rev
Log:
2017-09-29  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/81481
	* ira-costs.c (scan_one_insn): Don't take into account PIC equiv
	with a symbol for LRA.

2017-09-29  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/81481
	* gcc.target/i386/pr81481.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr81481.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-costs.c
    trunk/gcc/testsuite/ChangeLog
Comment 6 Jakub Jelinek 2017-10-11 10:44:26 UTC
Fixed on the trunk.
Comment 7 Richard Biener 2017-12-07 14:51:14 UTC
Patch has been applied to our GCC 7 with no issues sofar (but fixing the reported issues).
Comment 8 Aurelien Jarno 2018-01-15 16:34:58 UTC
(In reply to Richard Biener from comment #7)
> Patch has been applied to our GCC 7 with no issues sofar (but fixing the
> reported issues).

Given all is fine with this patch, do you think it can be backported to the GCC 7 branch?
Comment 9 rguenther@suse.de 2018-01-15 19:26:32 UTC
On January 15, 2018 5:34:58 PM GMT+01:00, aurelien at aurel32 dot net <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81481
>
>--- Comment #8 from Aurelien Jarno <aurelien at aurel32 dot net> ---
>(In reply to Richard Biener from comment #7)
>> Patch has been applied to our GCC 7 with no issues sofar (but fixing
>the
>> reported issues).
>
>Given all is fine with this patch, do you think it can be backported to
>the GCC
>7 branch?

We're using it at suse so I think yes.
Comment 10 Richard Biener 2018-01-16 09:52:37 UTC
Author: rguenth
Date: Tue Jan 16 09:51:57 2018
New Revision: 256731

URL: https://gcc.gnu.org/viewcvs?rev=256731&root=gcc&view=rev
Log:
2018-01-16  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2017-09-29  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/81481
	* ira-costs.c (scan_one_insn): Don't take into account PIC equiv
	with a symbol for LRA.

	* gcc.target/i386/pr81481.c: New.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr81481.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/ira-costs.c
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 11 Richard Biener 2018-01-16 12:04:13 UTC
Fixed.
Comment 12 Aurelien Jarno 2018-01-16 13:31:06 UTC
(In reply to Richard Biener from comment #11)
> Fixed.

Thanks!