qtwebkit-5.4 fails to build: % g++ -c -w -O2 -std=c++0x -march=amdfam10 SVGAllInOne.ii svg/SVGSVGElement.cpp: In member function ‘WebCore::FloatRect WebCore::SVGSVGElement::viewport() const’: svg/SVGSVGElement.cpp:60:5443: internal compiler error: in get_address_mode, at rtlanal.c:5442 0xc411f1 get_address_mode(rtx_def*) ../../gcc/gcc/rtlanal.c:5442 0x98fb82 adjust_address_1(rtx_def*, machine_mode, long, int, int, int, long) ../../gcc/gcc/emit-rtl.c:2227 0x9c600f store_field ../../gcc/gcc/expr.c:6715 0x9cbbc0 expand_assignment(tree_node*, tree_node*, bool) ../../gcc/gcc/expr.c:5000 0x8cda6e expand_gimple_stmt_1 ../../gcc/gcc/cfgexpand.c:3385 0x8cda6e expand_gimple_stmt ../../gcc/gcc/cfgexpand.c:3481 0x8d4807 expand_gimple_basic_block ../../gcc/gcc/cfgexpand.c:5397 0x8d6027 execute ../../gcc/gcc/cfgexpand.c:6006 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. Adding -fno-ipa-icf fixes the issue. The testcase is very large: ~10MB. Reduced testcase should be ready tomorrow.
Created attachment 34644 [details] unreduced testcase I'm having a hard time reducing the testcase. Unreduced testcase attached.
Started with r220230. Supposedly some thunk related ICF stuff.
markus@x4 tmp % cat SVGAllInOne.ii class A { int m_x, m_y; }; class B { A m_location; int m_size; }; class C { public: virtual B m_fn1 () const; }; class D { B m_fn2 () const; int m_fn3 () const; C *m_fn4 () const; }; int D::m_fn3 () const { m_fn4 ()->m_fn1 (); } B D::m_fn2 () const { return B (); } class F : C { B m_fn1 () const { return B (); } }; markus@x4 tmp % /var/tmp/gcc_test/usr/local/bin/g++ -c -O2 SVGAllInOne.ii SVGAllInOne.ii: In member function ‘B D::m_fn2() const’: SVGAllInOne.ii:27:1: internal compiler error: in expand_expr_addr_expr_1, at expr.c:7735 D::m_fn2 () const ^ 0x9c3db7 expand_expr_addr_expr_1 ../../gcc/gcc/expr.c:7735 0x9b81b9 expand_expr_addr_expr ../../gcc/gcc/expr.c:7849 0x9b81b9 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:10723 0x9b8910 expand_expr ../../gcc/gcc/expr.h:254 0x9b8910 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:9903 0x9cbb46 expand_expr ../../gcc/gcc/expr.h:254 0x9cbb46 expand_assignment(tree_node*, tree_node*, bool) ../../gcc/gcc/expr.c:5085 0x8cd8fe expand_gimple_stmt_1 ../../gcc/gcc/cfgexpand.c:3385 0x8cd8fe expand_gimple_stmt ../../gcc/gcc/cfgexpand.c:3481 0x8d4697 expand_gimple_basic_block ../../gcc/gcc/cfgexpand.c:5397 0x8d5eb7 execute ../../gcc/gcc/cfgexpand.c:6006 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
Hmm, mine I suppose ;)
Hmm, an RTL expansion issue. We optimize m_fn2 as B D::m_fn2() const (const struct D * const this) { ;; basic block 2, loop depth 0 ;; pred: ENTRY MEM[(struct B *)&<retval>] = 0; MEM[(struct B *)&<retval> + 4B] = 0; MEM[(struct B *)&<retval> + 8B] = 0; return <retval>; ;; succ: EXIT } and we want to store <result_decl 0x7ffff7ff82d0 D.2359 type <record_type 0x7ffff6c55000 B type_5 type_6 BLK size <integer_cst 0x7ffff6c50bd0 constant 96> unit size <integer_cst 0x7ffff6c50c00 constant 12> align 32 symtab 0 alias set 5 canonical type 0x7ffff6c55000 fields <field_decl 0x7ffff6c438e8 m_location type <record_type 0x7ffff6c44dc8 A> private nonlocal decl_3 DI file /aux/hubicka/t.ii line 7 col 5 size <integer_cst 0x7ffff6ad6e58 constant 64> unit size <integer_cst 0x7ffff6ad6e70 constant 8> align 32 offset_align 128 offset <integer_cst 0x7ffff6ad6e88 constant 0> bit offset <integer_cst 0x7ffff6ad6ed0 constant 0> context <record_type 0x7ffff6c55000 B> chain <field_decl 0x7ffff6c43980 m_size>> context <translation_unit_decl 0x7ffff7ff81e0 D.1> full-name "class B" X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown pointer_to_this <pointer_type 0x7ffff6c5a738> chain <type_decl 0x7ffff6c437b8 B>> used ignored regdecl BLK file /aux/hubicka/t.ii line 27 col 13 size <integer_cst 0x7ffff6c50bd0 96> unit size <integer_cst 0x7ffff6c50c00 12> align 32 context <function_decl 0x7ffff6c561b0 m_fn2> (parallel:BLK [ (expr_list:REG_DEP_TRUE (reg:DI 87 [ <retval> ]) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:SI 88 [ <retval>+8 ]) (const_int 8 [0x8])) ])> into its DECL_RTL but somehow we end up not doing that correctly. Without -fipa-icf we produce: B D::m_fn2() const (const struct D * const this) { struct B D.2398; <bb 2>: D.2398.m_location.m_x = 0; D.2398.m_location.m_y = 0; D.2398.m_size = 0; return D.2398; } that looks equivalent but gets compiled well. We decide to unify m_fn1 and m_fn2 as: virtual B F::m_fn1() const (const struct F * const this) { struct B D.2396; <bb 2>: D.2396.m_location.m_x = 0; D.2396.m_location.m_y = 0; D.2396.m_size = 0; return D.2396; } B D::m_fn2() const (const struct D * const this) { <bb 2>: <retval> = F::m_fn1 (this_2(D)); [tail call] return <retval>; } which eventually gets inlined back of course (I will teach ICF to skip thunk creation when inline sequence is shorter). The inliner produces: B D::m_fn2() const (const struct D * const this) { struct B D.2413; <bb 2>: D.2413.m_location.m_x = 0; D.2413.m_location.m_y = 0; D.2413.m_size = 0; <retval> = D.2413; return <retval>; } I suppose the extra <retval> store is the problem. Implementing wrapper by hand gives me: B D::m_fn5() const (const struct D * const this) { struct B D.2406; <bb 2>: D.2406 = D::m_fn2 (this_2(D)); return D.2406; } which is bit uncool by adding extra copy and I also think it won't fly for DECL_BY_REFERENCE stuff. Jakub/Richi can we get the direct return to work or shall we extend thunk generation to introduce a temporary? If so under what conditions? thunk expansion already does: if (DECL_BY_REFERENCE (resdecl)) { restmp = gimple_fold_indirect_ref (resdecl); if (!restmp) restmp = build2 (MEM_REF, TREE_TYPE (TREE_TYPE (DECL_RESULT (alias))), resdecl, build_int_cst (TREE_TYPE (DECL_RESULT (alias)), 0)); } else if (!is_gimple_reg_type (restype)) { restmp = resdecl; if (TREE_CODE (restmp) == VAR_DECL) add_local_decl (cfun, restmp); BLOCK_VARS (DECL_INITIAL (current_function_decl)) = restmp; } else restmp = create_tmp_reg (restype, "retval"); I suppose we may want to add case for !DECL_BY_REFERENCE that needs temporary "retval", too.
Created attachment 34685 [details] gcc5-pr64896.patch I think if !aggregate_value_p, we really should be using a temporary var rather than RESULT_DECL. That said, if it doesn't generate optimal code, we should optimize it, but it wouldn't be IPA's or expand_thunk's task, such problem would affect all similar user written code. But, on this exact testcase with the patch I get identical assembly to -fno-ipa-icf. The movl $0, -24(%rsp) movl $0, -20(%rsp) xorl %edx, %edx movq -24(%rsp), %rax is of course not optimal, xorl %eax, %eax; xorl %edx, %edx would do too, but it is a matter of some RTL optimization of late GIMPLE to improve this. But, related to this, I've noticed that: 1) pass_nrv doesn't seem to work very well on x86_64, apparently the thing is that the temporaries usually have DECL_ALIGN bumped by LOCAL_DECL_ALIGNMENT to 128 bits, while RESULT_DECL typically does not that "optimization", so the nrv pass gives up. Wonder at least for the case where the decl isn't addressable why would we care about DECL_ALIGN of the temporary (rather than just TYPE_ALIGN). 2) even on i386 where tree nrv usually works, I see on testcase like: struct A { int m_x, m_y; }; struct Q { struct A m_location; int m_size; long m_foo; }; struct Q foo (); struct Q bar () { struct Q x = foo (); return x; } (in C, so that C++ nrv doesn't trigger) unnecessary stack adjustments
Hmm, OK, lets go for extra temporary and hope that copy propagation/NRV will do the trick. I wonder if we don't want to drop the extra alignment on x86_64 local vars. Vectorizer knows how to upgrade and I doubdt the memcpy/memset expanders care too much given that the offset in frame can often be determined.
Author: jakub Date: Fri Feb 6 20:47:20 2015 New Revision: 220489 URL: https://gcc.gnu.org/viewcvs?rev=220489&root=gcc&view=rev Log: PR ipa/64896 * cgraphunit.c (cgraph_node::expand_thunk): If restype is not is_gimple_reg_type nor the thunk_fndecl returns aggregate_value_p, set restmp to a temporary variable instead of resdecl. * g++.dg/ipa/pr64896.C: New test. Added: trunk/gcc/testsuite/g++.dg/ipa/pr64896.C Modified: trunk/gcc/ChangeLog trunk/gcc/cgraphunit.c trunk/gcc/testsuite/ChangeLog
Fixed.
*** Bug 61207 has been marked as a duplicate of this bug. ***
Author: yroux Date: Tue Mar 10 19:20:30 2015 New Revision: 221333 URL: https://gcc.gnu.org/viewcvs?rev=221333&root=gcc&view=rev Log: gcc/ 2015-03-10 Yvan Roux <yvan.roux@linaro.org> Backport from trunk r220489. 2015-02-06 Jakub Jelinek <jakub@redhat.com> PR ipa/64896 * cgraphunit.c (cgraph_node::expand_thunk): If restype is not is_gimple_reg_type nor the thunk_fndecl returns aggregate_value_p, set restmp to a temporary variable instead of resdecl. gcc/testsuite/ 2015-03-10 Yvan Roux <yvan.roux@linaro.org> Backport from trunk r220489. 2015-02-06 Jakub Jelinek <jakub@redhat.com> PR ipa/64896 * g++.dg/ipa/pr64896.C: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/g++.dg/ipa/pr64896.C Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/cgraphunit.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog