#include <boost/exception_ptr.hpp> struct foo { }; int main() { boost::copy_exception(foo()); } Compiling the above with -O3 results in the following instruction being emitted: movdqa %xmm0, _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep(%rip) But that symbol need not be 16-byte aligned (it's a boost::exception_ptr, which contains a boost::shared_ptr, which is just a pair of pointers). This crashes if _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep comes from another object file where it is declared with 8-byte alignment. Possible duplicate of 54167? Works fine with 4.6.2 Preprocessed source is attached.
Created attachment 29611 [details] Preprocessed source file
This instruction appears in an EH region of function _ZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEv (AFAIK). It's defined twice, once weak and aligned 8 and once strong and aligned 16, so AFAIK it _is_ aligned properly. .align 8 .type _ZGVZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep, @gnu_unique_object .size _ZGVZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep, 8 _ZGVZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep: .zero 8 .weak _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep .section .bss._ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep,"awG",@nobits,_ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep,comdat .align 16 .type _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep, @gnu_unique_object .size _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep, 16 _ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep: .zero 16 and readelf shows: [192] .bss._ZZN5boost16 NOBITS 0000000000000000 00001ca0 0000000000000010 0000000000000000 WAG 0 0 16 with alignment of 16. > This crashes if >_ZZN5boost16exception_detail27get_static_exception_objectINS0_10bad_alloc_EEENS_13exception_ptrEvE2ep > comes from another object file where it is declared with 8-byte alignment. so this would be a bug and a violation of ODR(?) What's this "other object file"? The code piece in question is: template <class Exception> exception_ptr get_static_exception_object() { Exception ba; exception_detail::clone_impl<Exception> c(ba); static exception_ptr ep(shared_ptr<exception_detail::clone_base const>(new exception_detail::clone_impl<Exception>(c))); return ep; } OTOH, not sure what increases the alignment of that object from it's type-alignmend. Same alignment is emitted with 4.8 and also 4.6 - so you must be unlucky with that other object file (compiled with which compiler?) Please also attach preprocessed source of the "other object file"
Sorry for my cryptic comments about the "other object file". It's compiled with icc 13. I will attach the preprocessed source and generated assembly.
Created attachment 29618 [details] Preprocessed with ICC
Created attachment 29619 [details] Generated by icc 13
This is due to ix86_data_alignment, which has: /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ if (TARGET_64BIT) { if (AGGREGATE_TYPE_P (type) && TYPE_SIZE (type) && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 128 || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 128) return 128; } The comment and wording of http://refspecs.linuxfoundation.org/elf/x86_64-abi-0.95.pdf seems to be inconsistent with what the code does. The comment and 0.95 version of the psABI only talks about arrays: "An array uses the same alignment as its elements, except that a local or global array variable that requires at least 16 bytes, or a C99 local or global variable-length array variable, always has alignment of at least 16 bytes." but AGGREGATE_TYPE_P isn't solely about array local/global variables, but any aggregates (arrays, structs, unions, ...). ep apparently has size of 16 and the above code aligns it to 16 bytes, but icc probably aligns it just to 8 bytes, as the maximum alignment of its members. Now, changing the above to only look at arrays would probably cause more harm than good, because while code compiled by fixed gcc would be compatible with icc, it would be incompatible with code compiled by older gcc. Guess if we want to change something, we'd need to change it in a way that such aggregates (non-array ones) of size 16 and above are still 16-byte aligned, but if the variable isn't known to bind locally, don't increase DECL_ALIGN of the var, so that no optimizers actually rely on it.
Confirmed.
Guess we'd need to split DATA_ALIGNMENT into two different macros (or one with an extra argument), so that align_variable would know what alignment is part of ABI and what is just an optimization above that, then align_variable could call targetm.binds_local_p to see if DECL_ALIGN can be increased to the optimization level or needs to stay at the ABI guaranteed level. And then when assembling vars, we'd increase the emitted alignment to the optimization level.
Smaller testcase (-O2 -fpic): struct S { long a, b; } s; int foo (void) { return ((long) &s) & 15; } is since http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162943 optimized into return 0, even when (probably) the psABI doesn't guarantee that. But e.g. for __builtin_memset (&s, 0, sizeof (s)); one can see already in 4.0 RTL dumps with -O2 -fpic that MEM_ALIGN of s is assumed to be 128-bit.
Mine.
This affects at least PowerPC, too, which implements DATA_ALIGNMENT to add additional alignment beyond that specified by the ABI. Isn't TYPE_ALIGN already supposed to return the ABI-mandated alignment for objects of a given type? The documentation for DATA_ALIGNMENT already suggests that its purpose is to add additional alignment for optimization purposes and I suspect other targets may be using it that way, too. Perhaps what's needed here is more careful monitoring of the places where DATA_ALIGNMENT is being used, rather than splitting it into two macros or adding an argument to control the two uses. Or at least, we'd have to clarify how the requirements for the ABI-conforming use of DATA_ALIGNMENT differ from what TYPE_ALIGN is supposed to do. It seems to me that DATA_ALIGNMENT's original purpose was to add additional alignment on variable definitions, and IIUC the problem now is either that it is being used in other contexts or that its intended use is not taking into account common, weak, and/or comdat definitions where the linker may substitute a less-aligned definition from another compilation unit. Also, somebody should check whether vect_can_force_dr_alignment_p in tree-vect-data-refs.c is catching all the cases it needs to for ABI conformance.
Maybe it was original DATA_ALIGNMENT purpose, but it certainly serves for both right now, which is wrong, we need one for ABI mandated stuff and one for optimization stuff beyond, where optimization alignment can be used if it can be proved that we'll bind to the optimized decl, but ABI has to be used otherwise. E.g. x86_64 ABI says that certain arrays are aligned that and that way, it is certainly something beyond what TYPE_ALIGN provides (changing TYPE_ALIGN of the arrays would affect layout of structures, but that is wrong).
Created attachment 30275 [details] gcc49-pr56564.patch Untested fix. Honza, is the array type >= 16 bytes alignment increase the only ABI mandated one and all the rest is just optimization?
Author: jakub Date: Mon Jun 10 15:41:52 2013 New Revision: 199898 URL: http://gcc.gnu.org/viewcvs?rev=199898&root=gcc&view=rev Log: PR target/56564 * varasm.c (align_variable): Don't use DATA_ALIGNMENT or CONSTANT_ALIGNMENT if !decl_binds_to_current_def_p (decl). Use DATA_ABI_ALIGNMENT for that case instead if defined. (get_variable_align): New function. (get_variable_section, emit_bss, emit_common, assemble_variable_contents, place_block_symbol): Use get_variable_align instead of DECL_ALIGN. (assemble_noswitch_variable): Add align argument, use it instead of DECL_ALIGN. (assemble_variable): Adjust caller. Use get_variable_align instead of DECL_ALIGN. * config/i386/i386.h (DATA_ALIGNMENT): Adjust x86_data_alignment caller. (DATA_ABI_ALIGNMENT): Define. * config/i386/i386-protos.h (x86_data_alignment): Adjust prototype. * config/i386/i386.c (x86_data_alignment): Add opt argument. If opt is false, only return the psABI mandated alignment increase. * config/c6x/c6x.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * config/mmix/mmix.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * config/mmix/mmix.c (mmix_data_alignment): Adjust function comment. * config/s390/s390.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * doc/tm.texi.in (DATA_ABI_ALIGNMENT): Document. * doc/tm.texi: Regenerated. * gcc.target/i386/pr56564-1.c: New test. * gcc.target/i386/pr56564-2.c: New test. * gcc.target/i386/pr56564-3.c: New test. * gcc.target/i386/pr56564-4.c: New test. * gcc.target/i386/avx256-unaligned-load-4.c: Add -fno-common. * gcc.target/i386/avx256-unaligned-store-1.c: Likewise. * gcc.target/i386/avx256-unaligned-store-3.c: Likewise. * gcc.target/i386/avx256-unaligned-store-4.c: Likewise. * gcc.target/i386/vect-sizes-1.c: Likewise. * gcc.target/i386/memcpy-1.c: Likewise. * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c (tmp): Initialize. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c (tmp): Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr56564-1.c trunk/gcc/testsuite/gcc.target/i386/pr56564-2.c trunk/gcc/testsuite/gcc.target/i386/pr56564-3.c trunk/gcc/testsuite/gcc.target/i386/pr56564-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/c6x/c6x.h trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/config/mmix/mmix.c trunk/gcc/config/mmix/mmix.h trunk/gcc/config/s390/s390.h trunk/gcc/doc/tm.texi trunk/gcc/doc/tm.texi.in trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c trunk/gcc/testsuite/gcc.target/i386/memcpy-1.c trunk/gcc/testsuite/gcc.target/i386/vect-sizes-1.c trunk/gcc/varasm.c
Author: jakub Date: Tue Jun 11 06:03:46 2013 New Revision: 199934 URL: http://gcc.gnu.org/viewcvs?rev=199934&root=gcc&view=rev Log: PR target/56564 * varasm.c (get_variable_align): Move #endif to the right place. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c
On x86_64-apple-darwin10.8 at revision 199935, I get the following failures for the tests added at revision 199898: FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "&s" 1 FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "return 0" 1 FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&s" 1 FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&t" 1 The optimized dumps are (blank lines removed): [macbook] f90/bug% cat pr56564-1.c.165t.optimized ;; Function foo (foo, funcdef_no=0, decl_uid=1741, symbol_order=2) foo () { <bb 2>: return 0; } ;; Function bar (bar, funcdef_no=1, decl_uid=1744, symbol_order=3) bar () { <bb 2>: return 0; } [macbook] f90/bug% cat pr56564-3.c.165t.optimized ;; Function foo (foo, funcdef_no=0, decl_uid=1741, symbol_order=2) foo () { struct S * D.1770; long int s.0; int _2; int _3; <bb 2>: _5 = __builtin___emutls_get_address (&__emutls_v.s); s.0_1 = (long int) _5; _2 = (int) s.0_1; _3 = _2 & 15; return _3; } ;; Function bar (bar, funcdef_no=1, decl_uid=1744, symbol_order=3) bar () { char * D.1769; char[16] * D.1768; long int _1; int _2; int _3; <bb 2>: _5 = __builtin___emutls_get_address (&__emutls_v.t); _6 = &*_5[0]; _1 = (long int) _6; _2 = (int) _1; _3 = _2 & 15; return _3; }
(In reply to Dominique d'Humieres from comment #16) > On x86_64-apple-darwin10.8 at revision 199935, I get the following failures > for the tests added at revision 199898: > > FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "&s" 1 > FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "return 0" 1 > FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&s" 1 > FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&t" 1 Yeah, MachO is broken by design, guess the tests need to be restricted to non-darwin non-PE.
(In reply to comment #17) > Yeah, MachO is broken by design, guess the tests need to be restricted > to non-darwin non-PE. Questions: (1) What is PE? (2) Is the second "return 0;" wrong code or valid optimization? If the former, why? (3) Is the decoration "__emutls_v." the same for all the emutls platforms? If not, where can I find the variants?
The mingw/cygwin stuff. The testcases assume that the symbols have decl_binds_to_current_def_p false, if that isn't the case (because darwin/mingw apparently don't allow symbol interposition), then the testcase can't work on those.
Author: jakub Date: Wed Jun 12 06:43:05 2013 New Revision: 199984 URL: http://gcc.gnu.org/viewcvs?rev=199984&root=gcc&view=rev Log: PR target/56564 * varasm.c (decl_binds_to_current_def_p): Call binds_local_p target hook even for !TREE_PUBLIC decls. If no resolution info is available, return false for common and external decls. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c Author: jakub Date: Wed Jun 12 06:46:53 2013 New Revision: 199985 URL: http://gcc.gnu.org/viewcvs?rev=199985&root=gcc&view=rev Log: PR target/56564 * gcc.target/i386/pr56564-1.c: Skip on darwin, mingw and cygwin. * gcc.target/i386/pr56564-3.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr56564-1.c trunk/gcc/testsuite/gcc.target/i386/pr56564-3.c
This bug isn't fixed in GCC 4.9. -O3 increases alignment from 64 bits to 128 bits on the original testcase: Hardware watchpoint 6: *(unsigned int *) 0x7fffee9b4468 Old value = 64 New value = 128 ensure_base_align (stmt_info=0x1c8f990, dr=0x1db5b20) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:4907 4907 DECL_USER_ALIGN (base_decl) = 1; (gdb) bt #0 ensure_base_align (stmt_info=0x1c8f990, dr=0x1db5b20) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:4907 #1 0x0000000000d33471 in vectorizable_store (stmt=0x7fffed95a280, gsi=0x7fffffffd830, vec_stmt=0x7fffffffd790, slp_node=0x1d9e7a0) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:5131 #2 0x0000000000d38f80 in vect_transform_stmt (stmt=0x7fffed95a280, gsi=0x7fffffffd830, grouped_store=0x7fffffffd84a, slp_node=0x1d9e7a0, slp_node_instance=0x1cb3e10) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:7211 #3 0x0000000000d5a980 in vect_schedule_slp_instance (node=0x1d9e7a0, instance=0x1cb3e10, vectorization_factor=1) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3084 #4 0x0000000000d5abd0 in vect_schedule_slp (loop_vinfo=0x0, bb_vinfo=0x1ddf410) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3154 #5 0x0000000000d5aea7 in vect_slp_transform_bb (bb=0x7fffece8ec30) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3230 #6 0x0000000000d5e41b in execute_vect_slp () at /export/gnu/import/git/gcc-release/gcc/tree-vectorizer.c:605 #7 0x0000000000d5e4c9 in (anonymous namespace)::pass_slp_vectorize::execute ( this=0x1b97010) at /export/gnu/import/git/gcc-release/gcc/tree-vectorizer.c:649 #8 0x0000000000a7da14 in execute_one_pass (pass=0x1b97010) ---Type <return> to continue, or q <return> to quit---q at /export/gnu/imporQuit (gdb) f 1 #1 0x0000000000d33471 in vectorizable_store (stmt=0x7fffed95a280, gsi=0x7fffffffd830, vec_stmt=0x7fffffffd790, slp_node=0x1d9e7a0) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:5131 5131 ensure_base_align (stmt_info, dr); (gdb) f 2 #2 0x0000000000d38f80 in vect_transform_stmt (stmt=0x7fffed95a280, gsi=0x7fffffffd830, grouped_store=0x7fffffffd84a, slp_node=0x1d9e7a0, slp_node_instance=0x1cb3e10) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:7211 7211 done = vectorizable_store (stmt, gsi, &vec_stmt, slp_node); (gdb) This bug may be really fixed by r221268: iff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index aa9d43f..41ff802 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -4956,8 +4956,13 @@ ensure_base_align (stmt_vec_info stmt_info, struct data_reference *dr) tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree base_decl = ((dataref_aux *)dr->aux)->base_decl; - DECL_ALIGN (base_decl) = TYPE_ALIGN (vectype); - DECL_USER_ALIGN (base_decl) = 1; + if (decl_in_symtab_p (base_decl)) + symtab_node::get (base_decl)->increase_alignment (TYPE_ALIGN (vectype)); + else + { + DECL_ALIGN (base_decl) = TYPE_ALIGN (vectype); + DECL_USER_ALIGN (base_decl) = 1; + } ((dataref_aux *)dr->aux)->base_misaligned = false; } } in GCC 5.
Seems the bug does still exist in 6.3.0 20170516 (Debian 6.3.0-18). I get a GP on >x0x55555574d8c8 <...[abi:cxx11]() const+264> movdqa 0x68(%rsp),%xmm0 x0x55555574d8ce <...[abi:cxx11]() const+270> lea 0x80(%rsp),%r13 x0x55555574d8d6 <...[abi:cxx11]() const+278> movq $0x0,0x50(%rsp) x0x55555574d8df <...[abi:cxx11]() const+287> movl $0x0,0x10(%rsp) x0x55555574d8e7 <...[abi:cxx11]() const+295> movaps %xmm0,(%rsp) x0x55555574d8eb <...[abi:cxx11]() const+299> movq $0x0,0x6(%rsp) x0x55555574d8f4 <...[abi:cxx11]() const+308> movw $0x0,0xe(%rsp) x0x55555574d8fb <...[abi:cxx11]() const+315> movdqa (%rsp),%xmm1 x0x55555574d900 <...[abi:cxx11]() const+320> movaps %xmm1,0x40(%rsp) The asm code is obviously wrong, because movdqa 0x68(%rsp),%xmm0 followed by movdqa (%rsp),%xmm1 without changes to %rsp has to fail. %rsp was 0x7fffecd477d0. Code was C++ compiled with -O3 and x86_64. The underlying data structure is boost::asio::ip::address, which consists of an enum (4 bytes), address_v4 (4 bytes) and address_v6 (16 bytes). The GP occurs when accessing the ipv6 address.
The bug is fixed, you must be running into a different issue, either in the source you're compiling, or in the compiler. So, please open a new bugreport instead of commenting on a different one, and supply all the needed information (see http://gcc.gnu.org/bugs/ for details on what we need).