Bug 59163 - [4.8/4.9 Regression] program compiled with g++ -O3 segfaults
Summary: [4.8/4.9 Regression] program compiled with g++ -O3 segfaults
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.8.2
: P2 normal
Target Milestone: 4.8.3
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-11-17 22:44 UTC by Donny Ward
Modified: 2013-12-10 07:50 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 4.7.3
Known to fail:
Last reconfirmed: 2013-11-18 00:00:00


Attachments
Stripped preprocessed C++ source. (707 bytes, text/plain)
2013-11-17 22:44 UTC, Donny Ward
Details
gcc49-pr59163.patch (1.20 KB, patch)
2013-11-29 14:38 UTC, Jakub Jelinek
Details | Diff
gcc49-pr59163.patch (1.33 KB, patch)
2013-11-29 16:57 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Donny Ward 2013-11-17 22:44:51 UTC
Created attachment 31232 [details]
Stripped preprocessed C++ source.

The attached test case crashes (segmentation fault) when run. It is a preprocessed file where I stripped just about everything from <algorithm> except std::for_each.

gcc -v:

~/src $ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/app/gcc/4.8.2/libexec/gcc/x86_64-unknown-linux-gnu/4.8.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./gcc-4.8.2/configure --prefix=/app/gcc/4.8.2 --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 4.8.2 (GCC) 
~/src $ 


The complete command line that triggers the bug:
g++ -O3 -std=c++11 -Wall -Wextra bug2.ii

...then run the output program to see the segmentation fault

I tried building with:
g++ -O3 -std=c++11 -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -Wall -Wextra bug2.ii

...and the same crash occurs.

The compiler outputs no messages (no warnings errors etc).

The program runs without crashing if compiled with -O0, -O1, -O2, or -Os.

I compiled with the latest version of clang++ bundled with Mac's XCode, trying each optimization level and the program runs fine. Same with the latest version of MSVC (Visual Studio 2013).


The gdb segfault and backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004007a7 in Quaternion::slerp(Quaternion const&, Quaternion const&) () at bug2.ii:82
82	{
(gdb) bt
#0  0x00000000004007a7 in Quaternion::slerp(Quaternion const&, Quaternion const&) () at bug2.ii:82
#1  0x0000000000400475 in main () at bug2.ii:97
(gdb) 


Extra observations:
If I comment out "int parent" in the struct definition so that the struct becomes

  struct Joint
  {
    //int parent;
    Quaternion orient;
  };

the program no longer segfaults when compiled/run.
Comment 1 Jonathan Wakely 2013-11-18 10:49:17 UTC
With 4.8 -fno-tree-vectorize prevents the segfault

It works on trunk with -O3, but fails if -ftree-vectorize is added.
Comment 2 Jakub Jelinek 2013-11-29 08:16:00 UTC
Started with r190492.
Comment 3 Jakub Jelinek 2013-11-29 08:52:42 UTC
Reduced testcase for -O3:
struct A { float a[4]; };
struct B { int b; A a; };

__attribute__((noinline, noclone)) void
bar (A &a)
{
  if (a.a[0] != 36.0f || a.a[1] != 42.0f || a.a[2] != 48.0f || a.a[3] != 54.0f)
    __builtin_abort ();
}

__attribute__((noinline, noclone)) void
foo (A &a)
{
  int i;
  A c = a;
  for (i = 0; i < 4; i++)
    c.a[i] *= 6.0f;
  a = c;
  bar (a);
}

int
main ()
{
  B b = { 5, { 6, 7, 8, 9 } };
  foo (b.a);
}
Comment 4 Richard Biener 2013-11-29 09:22:20 UTC
target issue then?
Comment 5 Jakub Jelinek 2013-11-29 10:34:06 UTC
Looks like it, yes.
In *.jump we still have (IMHO correct):
(insn 2 4 3 2 (set (reg/v/f:DI 89 [ a ])
        (reg:DI 5 di [ a ])) pr59163-2.C:13 85 {*movdi_internal}
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:TI 90)
        (mem:TI (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0 S16 A32])) pr59163-2.C:15 84 {*movti_internal}
     (nil))
(insn 7 6 8 2 (set (mem/c:TI (plus:DI (reg/f:DI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [3 c+0 S16 A128])
        (reg:TI 90)) pr59163-2.C:15 84 {*movti_internal}
     (nil))
...
(insn 9 8 10 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (mem/c:V4SF (plus:DI (reg/f:DI 20 frame)
                    (const_int -16 [0xfffffffffffffff0])) [2 MEM[(float *)&c]+0 S16 A128]))) pr59163-2.C:17 1269 {*mulv4sf3}
     (nil))

movti_internal handles unaligned loads properly.
Then *.dse1 transforms this into:
(insn 6 3 18 2 (set (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ])
        (mem:TI (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0 S16 A32])) pr59163-2.C:15 84 {*movti_internal}
     (nil))
(insn 18 6 8 2 (set (reg:V4SF 94)
        (subreg:V4SF (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ]) 0)) pr59163-2.C:15 -1
     (expr_list:REG_DEAD (reg:TI 90 [ MEM[(const struct A &)a_4(D)] ])
        (nil)))
...
(insn 9 8 19 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (reg:V4SF 94))) pr59163-2.C:17 1269 {*mulv4sf3}
     (expr_list:REG_DEAD (reg:V4SF 94)
        (expr_list:REG_DEAD (reg:V4SF 92)
            (nil))))
which also looks ok to me.  But then combine combines it into:
(insn 9 8 19 2 (set (reg:V4SF 91 [ vect__7.10 ])
        (mult:V4SF (reg:V4SF 92)
            (mem:V4SF (reg/v/f:DI 89 [ a ]) [3 MEM[(const struct A &)a_4(D)]+0 S16 A32]))) pr59163-2.C:17 1269 {*mulv4sf3}
     (expr_list:REG_DEAD (reg:V4SF 92)
        (nil)))
which is wrong (for pre-AVX), because mulv4sf3 can't accept unaligned memory.
Likely the SSEx pre-AVX predicates assume that an unaligned vector load will be done using UNSPEC and thus not really mergeable here, and don't count with the fact that the load could be done using integral mode and just subreged into vector mode.  Perhaps we need new predicates for this that would fail for exactly this situation (disallow unaligned scalar load subregged into vector mode for pre-AVX) and use them everywhere where SSE? doesn't accept unaligned loads?
Comment 6 Uroš Bizjak 2013-11-29 12:57:46 UTC
I think that we should disallow tie of TImode with 128bit vector modes due to different alignment requirements. Integer register pairs can load unaligned TImode without problems, while unaligned TImode will crash SSE insns.

Following patch fixes the problem:

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 205509)
+++ config/i386/i386.c  (working copy)
@@ -35237,6 +35237,7 @@ ix86_modes_tieable_p (enum machine_mode mode1, enu
     return (GET_MODE_SIZE (mode1) == 32
            && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
   if (GET_MODE_SIZE (mode2) == 16
+      && !(mode1 == TImode || mode2 == TImode)
       && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
     return (GET_MODE_SIZE (mode1) == 16
            && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));

However, it needs some refinements for AVX.
Comment 7 Uroš Bizjak 2013-11-29 13:43:47 UTC
Maybe even better idea is to use ix86_legitimate_combined_insn and reject combinations that would result in unaligned operands of all but vector move instructions.
Comment 8 Jakub Jelinek 2013-11-29 14:38:35 UTC
Created attachment 31331 [details]
gcc49-pr59163.patch

So like this?  Untested...
Comment 9 Uroš Bizjak 2013-11-29 16:28:26 UTC
(In reply to Jakub Jelinek from comment #8)
> Created attachment 31331 [details]
> gcc49-pr59163.patch
> 
> So like this?  Untested...

Yes, but I think that we can also allow simple vector loads and stores - they emit movu insn when unaligned operand is detected.
Comment 10 Jakub Jelinek 2013-11-29 16:35:11 UTC
For stores I think the patch already allows that, that is the
  if (GET_CODE (*x) == SET && &SET_DEST (*x) == data)
    return 1;
in there (the reason why I've added it was that for the misaligned store insns
with UNSPEC_STOREU the misaligned MEM is in SET_DEST and SET_SRC just contains some rtl with UNSPEC_STOREU embedded somewhere in it.
So, would you like:
  if (GET_CODE (*x) == SET && (&SET_DEST (*x) == data || &SET_SRC (*x) == data))
    return 1;
?  That would IMHO handle simple loads from misaligned MEM too.
Comment 11 Uroš Bizjak 2013-11-29 16:39:52 UTC
(In reply to Jakub Jelinek from comment #10)
> For stores I think the patch already allows that, that is the
>   if (GET_CODE (*x) == SET && &SET_DEST (*x) == data)
>     return 1;
> in there (the reason why I've added it was that for the misaligned store
> insns
> with UNSPEC_STOREU the misaligned MEM is in SET_DEST and SET_SRC just
> contains some rtl with UNSPEC_STOREU embedded somewhere in it.
> So, would you like:
>   if (GET_CODE (*x) == SET && (&SET_DEST (*x) == data || &SET_SRC (*x) ==
> data))
>     return 1;
> ?  That would IMHO handle simple loads from misaligned MEM too.

Yes. Perhaps also special case sse4.2 string instructions (they can operate on unaligned data, too), and the patch covers everything.
Comment 12 Jakub Jelinek 2013-11-29 16:57:38 UTC
Created attachment 31332 [details]
gcc49-pr59163.patch

So like this?
Comment 13 Uroš Bizjak 2013-11-29 18:14:21 UTC
(In reply to Jakub Jelinek from comment #12)
> Created attachment 31332 [details]
> gcc49-pr59163.patch
> 
> So like this?

Yes, with adjusted comment in ix86_legitimate_combined_insn.

IIRC, unaligned moves won't be propagated during or after reload, so it looks to me that the approach is correct.
Comment 14 Uroš Bizjak 2013-11-29 19:56:33 UTC
(In reply to Uroš Bizjak from comment #13)
> (In reply to Jakub Jelinek from comment #12)
> > Created attachment 31332 [details]
> > gcc49-pr59163.patch
> > 
> > So like this?
> 
> Yes, with adjusted comment in ix86_legitimate_combined_insn.
> 
> IIRC, unaligned moves won't be propagated during or after reload, so it
> looks to me that the approach is correct.

Running the testsuite with your patch applied exposed a minor problem:

FAIL: gcc.target/i386/sse-1.c scan-assembler-not movaps

movlps/movhps and movlpd/movhpd can also handle unaligned operands (please see ix86_expand_vector_move_misalign). We should simply tag instructions that operate on unaligned operands (attribute type = ssemovu) and check type attribute instead.

The proposed approach would mean to change all scheduler and attribute calculation checks from "ssemov" to "ssemov,ssemovu", but this would be a simple mechanical change.
Comment 15 Jakub Jelinek 2013-11-29 20:28:02 UTC
(In reply to Uroš Bizjak from comment #14)
> (In reply to Uroš Bizjak from comment #13)
> > (In reply to Jakub Jelinek from comment #12)
> > > Created attachment 31332 [details]
> > > gcc49-pr59163.patch
> > > 
> > > So like this?
> > 
> > Yes, with adjusted comment in ix86_legitimate_combined_insn.
> > 
> > IIRC, unaligned moves won't be propagated during or after reload, so it
> > looks to me that the approach is correct.
> 
> Running the testsuite with your patch applied exposed a minor problem:
> 
> FAIL: gcc.target/i386/sse-1.c scan-assembler-not movaps
> 
> movlps/movhps and movlpd/movhpd can also handle unaligned operands (please
> see ix86_expand_vector_move_misalign). We should simply tag instructions
> that operate on unaligned operands (attribute type = ssemovu) and check type
> attribute instead.
> 
> The proposed approach would mean to change all scheduler and attribute
> calculation checks from "ssemov" to "ssemov,ssemovu", but this would be a
> simple mechanical change.

Yeah, I've noticed that too, plus the dumps I've used to note what instructions have been rejected by the patch show that UNSPEC_LDDQU would need to be treated like UNSPEC_LOADU.
The patch made difference for 32-bit:
gcc.target/i386/sse-1.c (as you write above)
gcc.dg/torture/pr18582-1.c (UNSPEC_LDDQU)
gcc.target/i386/sse4_1-movntdqa.c (UNSPEC_MOVNTDQA)
and 64-bit also
g++.dg/torture/pr59163.C (desirable)

Now, for movntdqa, I think it accepts only aligned memory, but the MEM in there is supposed to be aligned and is created by
          if (i == memory)
            {
              /* This must be the memory operand.  */
              op = ix86_zero_extend_to_Pmode (op);
              op = gen_rtx_MEM (mode, op); 
              gcc_assert (GET_MODE (op) == mode
                          || GET_MODE (op) == VOIDmode);
            }
and there is similar code for builtins that store.
Supposedly for this we should use get_pointer_alignment (arg) and at least
set_mem_align (op, get_pointer_alignment (arg)); if it is larger than MEM_ALIGN (op).  The gcc_assert doesn't make any sense to me, result of gen_rtx_MEM (mode, op) will always have GET_MODE (op) == mode, no need to assert that and it will never have VOIDmode.
Now, if we could easily find out which of the builtins assume aligned memory (and to what extent), we should also set it, because say using _mm_stream_load_si128 with not 128-bit aligned memory is user error, so GCC should be able to assume A128 there.  I'd say the sse-1.c case is similar, isn't it?
Comment 16 Jakub Jelinek 2013-11-29 20:54:27 UTC
Note that (according to my reading of the docs) e.g. movlps/movhps don't allow unaligned memory, so blindly allow any combine is wrong, but while the MEM operand in those cases is say V4SFmode, the loads or stores can have V2SFmode's alignment.  So, I don't see how ssemovu type would help us here, I think we handle all the completely unaligned loads/stores already (after the UNSPEC_LDDQU addition).  But we need some way how to determine for which instructions we should require only smaller alignment, plus tweak the two gen_rtx_MEM spots for the builtins and set mode there according to get_pointer_alignment and/or to assumed alignment of the builtins.
Comment 17 Jakub Jelinek 2013-11-30 07:14:04 UTC
Perhaps add new attribute ssememalign, with default 0, which would be (maximum for all alternatives) required alignment for memory operands in the instruction pre-AVX, or 0 for GET_MODE_ALIGNMENT.  So, instructions that can handle completely unaligned loads/stores in all alternatives would have ssememalign 8,
instructions that require everything properly aligned would have default ssememalign 0, and say movlps/movhps would have ssememalign 64.  The default would be conservatively correct, so whether instructions would have ssememalign attribute would be just an optimization issue (but, if it would be non-zero, it would have to be correct).
Comment 18 Uroš Bizjak 2013-11-30 09:59:38 UTC
(In reply to Jakub Jelinek from comment #17)
> Perhaps add new attribute ssememalign, with default 0, which would be
> (maximum for all alternatives) required alignment for memory operands in the
> instruction pre-AVX, or 0 for GET_MODE_ALIGNMENT.  So, instructions that can
> handle completely unaligned loads/stores in all alternatives would have
> ssememalign 8,
> instructions that require everything properly aligned would have default
> ssememalign 0, and say movlps/movhps would have ssememalign 64.  The default
> would be conservatively correct, so whether instructions would have
> ssememalign attribute would be just an optimization issue (but, if it would
> be non-zero, it would have to be correct).

This sounds like a good approach to me.
Comment 19 Jakub Jelinek 2013-12-04 11:11:27 UTC
Author: jakub
Date: Wed Dec  4 11:11:24 2013
New Revision: 205661

URL: http://gcc.gnu.org/viewcvs?rev=205661&root=gcc&view=rev
Log:
	PR target/59163
	* config/i386/i386.c (ix86_legitimate_combined_insn): If for
	!TARGET_AVX there is misaligned MEM operand with vector mode
	and get_attr_ssememalign is 0, return false.
	(ix86_expand_special_args_builtin): Add get_pointer_alignment
	computed alignment and for non-temporal loads/stores also
	at least GET_MODE_ALIGNMENT as MEM_ALIGN.
	* config/i386/sse.md
	(<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>,
	<sse>_storeu<ssemodesuffix><avxsizesuffix>,
	<sse2_avx_avx512f>_loaddqu<mode><mask_name>,
	<sse2_avx_avx512f>_storedqu<mode>, <sse3>_lddqu<avxsizesuffix>,
	sse_vmrcpv4sf2, sse_vmrsqrtv4sf2, sse2_cvtdq2pd, sse_movhlps,
	sse_movlhps, sse_storehps, sse_loadhps, sse_loadlps,
	*vec_interleave_highv2df, *vec_interleave_lowv2df,
	*vec_extractv2df_1_sse, sse2_movsd, sse4_1_<code>v8qiv8hi2,
	sse4_1_<code>v4qiv4si2, sse4_1_<code>v4hiv4si2,
	sse4_1_<code>v2qiv2di2, sse4_1_<code>v2hiv2di2,
	sse4_1_<code>v2siv2di2, sse4_2_pcmpestr, *sse4_2_pcmpestr_unaligned,
	sse4_2_pcmpestri, sse4_2_pcmpestrm, sse4_2_pcmpestr_cconly,
	sse4_2_pcmpistr, *sse4_2_pcmpistr_unaligned, sse4_2_pcmpistri,
	sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Add ssememalign attribute.
	* config/i386/i386.md (ssememalign): New define_attr.

	* g++.dg/torture/pr59163.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/torture/pr59163.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 20 Jakub Jelinek 2013-12-04 11:12:06 UTC
Author: jakub
Date: Wed Dec  4 11:12:04 2013
New Revision: 205663

URL: http://gcc.gnu.org/viewcvs?rev=205663&root=gcc&view=rev
Log:
	PR target/59163
	* config/i386/i386.c (ix86_legitimate_combined_insn): If for
	!TARGET_AVX there is misaligned MEM operand with vector mode
	and get_attr_ssememalign is 0, return false.
	(ix86_expand_special_args_builtin): Add get_pointer_alignment
	computed alignment and for non-temporal loads/stores also
	at least GET_MODE_ALIGNMENT as MEM_ALIGN.
	* config/i386/sse.md
	(<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>,
	<sse>_storeu<ssemodesuffix><avxsizesuffix>,
	<sse2_avx_avx512f>_loaddqu<mode><mask_name>,
	<sse2_avx_avx512f>_storedqu<mode>, <sse3>_lddqu<avxsizesuffix>,
	sse_vmrcpv4sf2, sse_vmrsqrtv4sf2, sse2_cvtdq2pd, sse_movhlps,
	sse_movlhps, sse_storehps, sse_loadhps, sse_loadlps,
	*vec_interleave_highv2df, *vec_interleave_lowv2df,
	*vec_extractv2df_1_sse, sse2_movsd, sse4_1_<code>v8qiv8hi2,
	sse4_1_<code>v4qiv4si2, sse4_1_<code>v4hiv4si2,
	sse4_1_<code>v2qiv2di2, sse4_1_<code>v2hiv2di2,
	sse4_1_<code>v2siv2di2, sse4_2_pcmpestr, *sse4_2_pcmpestr_unaligned,
	sse4_2_pcmpestri, sse4_2_pcmpestrm, sse4_2_pcmpestr_cconly,
	sse4_2_pcmpistr, *sse4_2_pcmpistr_unaligned, sse4_2_pcmpistri,
	sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Add ssememalign attribute.
	* config/i386/i386.md (ssememalign): New define_attr.

	* g++.dg/torture/pr59163.C: New test.

Modified:
    trunk/gcc/config/i386/i386.c
Comment 21 Jakub Jelinek 2013-12-04 15:50:05 UTC
Author: jakub
Date: Wed Dec  4 15:50:02 2013
New Revision: 205671

URL: http://gcc.gnu.org/viewcvs?rev=205671&root=gcc&view=rev
Log:
	PR target/59163
	* config/i386/i386.c (ix86_legitimate_combined_insn): If for
	!TARGET_AVX there is misaligned MEM operand with vector mode
	and get_attr_ssememalign is 0, return false.
	(ix86_expand_special_args_builtin): Add get_pointer_alignment
	computed alignment and for non-temporal loads/stores also
	at least GET_MODE_ALIGNMENT as MEM_ALIGN.
	* config/i386/sse.md
	(<sse>_loadu<ssemodesuffix><avxsizesuffix>,
	<sse>_storeu<ssemodesuffix><avxsizesuffix>,
	<sse2>_loaddqu<avxsizesuffix>,
	<sse2>_storedqu<avxsizesuffix>, <sse3>_lddqu<avxsizesuffix>,
	sse_vmrcpv4sf2, sse_vmrsqrtv4sf2, sse2_cvtdq2pd, sse_movhlps,
	sse_movlhps, sse_storehps, sse_loadhps, sse_loadlps,
	*vec_interleave_highv2df, *vec_interleave_lowv2df,
	*vec_extractv2df_1_sse, sse2_loadhpd, sse2_loadlpd, sse2_movsd,
	sse4_1_<code>v8qiv8hi2, sse4_1_<code>v4qiv4si2,
	sse4_1_<code>v4hiv4si2, sse4_1_<code>v2qiv2di2,
	sse4_1_<code>v2hiv2di2, sse4_1_<code>v2siv2di2, sse4_2_pcmpestr,
	*sse4_2_pcmpestr_unaligned, sse4_2_pcmpestri, sse4_2_pcmpestrm,
	sse4_2_pcmpestr_cconly, sse4_2_pcmpistr, *sse4_2_pcmpistr_unaligned,
	sse4_2_pcmpistri, sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Add
	ssememalign attribute.
	* config/i386/i386.md (ssememalign): New define_attr.

	* g++.dg/torture/pr59163.C: New test.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/g++.dg/torture/pr59163.C
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/i386/i386.c
    branches/gcc-4_8-branch/gcc/config/i386/i386.md
    branches/gcc-4_8-branch/gcc/config/i386/sse.md
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Comment 22 Jakub Jelinek 2013-12-04 15:55:43 UTC
Fixed.
Comment 23 Jakub Jelinek 2013-12-10 07:50:39 UTC
.