Created attachment 25367 [details] testcase, modified from pr43513.c testcase To reproduce on x86_64: ... $ gcc -Os pr43513-align.c --param large-stack-frame=30 $ ./a.out 16byte aligned 7fff5c4ce00c ... The address of the vla is printed, and it's not 16-byte aligned (ends in 'c'). Nevertheless the test whether the address is 16-byte aligned succeeds, and the string '16byte aligned' is printed. During compilation the following scenario happens: - During the propagation of the first ccp phase, the align of the alloca (16) is progagated to the lhs results.0D.3306_13 as lattice value 'CONSTANT 0x00000000000000000 (0xfffffffffffffffffffffffffffffff0)'. - This not propagated through 'D.3307_14 = &*results.0D.3306_13'. The propagation does not look at the lattice value of results.0D.3306_13, but at the alignment of the ptr_info, which at this point is not initialised yet. - During the finalize of the first ccp phase, ptr_info of results.0D.3306_13 is initialized with align 16, based on the lattice value. - During the propagation of the second ccp phase, the align of the ptr_info of results.0D.3306_13 of 16 is used to propagate through to the comparison 'if (D.3309_16 == 0)', which makes sure the '16byte aligned' string is printed. - During the finalize of the second ccp phase, the alloca is folded, and the new declared array gets an align of 4 bytes.
Hm, I suppose we should then make all replacement decls have BIGGEST_ALIGNMENT rather than min (BIGGEST_ALIGNMENT, object-size). Or alternatively (given we re-compute alignment together with folding alloca), assign the same alignment as folding would.
The question is of course what standards say about the alignment of alloca (4).
> Or alternatively (given we re-compute alignment together with folding alloca), > assign the same alignment as folding would. At the point that we determine the alloca alignment during propagation in visit_stmt, we cannot predict whether that alloca will be folded (during the same or later ccp phase). So the only way to achieve other alignment is to be conservative a bit longer for vla-allocas with respect to alignment: - keep align at 1 byte during ccp. - if we fold during ccp, assign align calculated at folding - after we are sure there is no more folding (at expand, or f.i. at the end of the second ccp phase if we limit folding to the first 2 ccp phases, to take advantage of the larger alignment in the middle-end), we assign BIGGEST_ALIGNMENT. > The question is of course what standards say about the alignment of > alloca (4) I think alloca is non-standard. But in the context of fold_builtin_alloca_for_var, alloca is the implementation vehicle of vlas, so the question is what the standard says about alignment of vlas.
Created attachment 25368 [details] proposed patch > Hm, I suppose we should then make all replacement decls have > BIGGEST_ALIGNMENT rather than min (BIGGEST_ALIGNMENT, object-size) Currently testing this patch on x86_64. 2011-09-27 Tom de Vries <tom@codesourcery.com> * tree-ssa-ccp.c (fold_builtin_alloca_for_var): Use align from ptr_info. * gcc.dg/pr50527.c: New test.
On Tue, 27 Sep 2011, vries at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50527 > > --- Comment #3 from vries at gcc dot gnu.org 2011-09-27 09:21:12 UTC --- > > Or alternatively (given we re-compute alignment together with folding alloca), > > assign the same alignment as folding would. > > At the point that we determine the alloca alignment during propagation in > visit_stmt, we cannot predict whether that alloca will be folded (during the > same or later ccp phase). > > So the only way to achieve other alignment is to be conservative a bit longer > for vla-allocas with respect to alignment: > - keep align at 1 byte during ccp. > - if we fold during ccp, assign align calculated at folding > - after we are sure there is no more folding (at expand, or f.i. at the end of > the second ccp phase if we limit folding to the first 2 ccp phases, to take > advantage of the larger alignment in the middle-end), we assign > BIGGEST_ALIGNMENT. I think we can check if the size is constant in evaluate_stmt and compute alignment according to that. It should only change from non-constant to constant, thus properly go down the lattice during propagation. We don't want to force excessive alignment on the replacement decls as that might require re-aligning the stack which is expensive. > > The question is of course what standards say about the alignment of > > alloca (4) > > I think alloca is non-standard. But in the context of > fold_builtin_alloca_for_var, alloca is the implementation vehicle of vlas, so > the question is what the standard says about alignment of vlas. Indeed.
> I think we can check if the size is constant in evaluate_stmt and > compute alignment according to that. We can only do that in the last ccp phase that does folding of vla-alllocas. If the argument is not constant, it will not be folded in this phase, but it might be folded during the next ccp phase, when the argument does turn constant. If the argument is constant, it might not be folded in this phase, but it still might be folded during the next ccp phase. Therefore, in evaluate_stmt, we cannot predict whether the alloca will be folded, unless we're in the last ccp phase. And the propagation of alignment of alloca starts in the first ccp phase. > It should only change from > non-constant to constant, thus properly go down the lattice during > propagation. Currently, the result of an alloca is always constant, to be precise, constant 0 with only lower bits valid. This is independent of whether the argument is constant.
On Tue, 27 Sep 2011, vries at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50527 > > --- Comment #6 from vries at gcc dot gnu.org 2011-09-27 10:49:23 UTC --- > > I think we can check if the size is constant in evaluate_stmt and > > compute alignment according to that. > > We can only do that in the last ccp phase that does folding of vla-alllocas. > > If the argument is not constant, it will not be folded in this phase, but it > might be folded during the next ccp phase, when the argument does turn > constant. > > If the argument is constant, it might not be folded in this phase, but it still > might be folded during the next ccp phase. > > Therefore, in evaluate_stmt, we cannot predict whether the alloca will be > folded, unless we're in the last ccp phase. And the propagation of alignment of > alloca starts in the first ccp phase. > > > It should only change from > > non-constant to constant, thus properly go down the lattice during > > propagation. > > Currently, the result of an alloca is always constant, to be precise, constant > 0 with only lower bits valid. This is independent of whether the argument is > constant. The parameter I meant. But yes if we don't fold alloca in ccp1 we might fold away alignment tests based on BIGGEST_ALIGNMENT while later ccp might fold it and use less alignment. Maybe don't assume any particular alignment for allocas for vlas then. Richard.
Created attachment 25371 [details] updated proposed patch > Maybe don't assume any particular alignment for allocas for vlas then. Updated patch accordingly, now testing on x86_64. 2011-09-27 Tom de Vries <tom@codesourcery.com> * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related allocas. * gcc.dg/pr50527.c: New test.
Author: vries Date: Fri Oct 7 12:49:49 2011 New Revision: 179655 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179655 Log: 2011-10-07 Tom de Vries <tom@codesourcery.com> PR middle-end/50527 * tree.c (build_common_builtin_nodes): Add local_define_builtin for * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN * tree-ssa-ccp.c (evaluate_stmt): Set align for * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using * ipa-pure-const.c (special_builtin_state): Handle * tree-ssa-alias.c (ref_maybe_used_by_call_p_1) * function.c (gimplify_parameters): Lower vla to * gimplify.c (gimplify_vla_decl): Same. * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-mudflap.c (mf_xform_statements): Same. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary) * varasm.c (incorporeal_function_p): Same. * tree-object-size.c (alloc_object_size): Same. * gimple.c (gimple_build_call_from_tree): Same. Modified: trunk/gcc/ChangeLog trunk/gcc/builtins.c trunk/gcc/builtins.def trunk/gcc/cfgexpand.c trunk/gcc/function.c trunk/gcc/gimple.c trunk/gcc/gimplify.c trunk/gcc/ipa-pure-const.c trunk/gcc/tree-mudflap.c trunk/gcc/tree-object-size.c trunk/gcc/tree-ssa-alias.c trunk/gcc/tree-ssa-ccp.c trunk/gcc/tree-ssa-dce.c trunk/gcc/tree.c trunk/gcc/varasm.c
Author: vries Date: Fri Oct 7 12:49:56 2011 New Revision: 179656 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179656 Log: 2011-10-07 Tom de Vries <tom@codesourcery.com> PR middle-end/50527 * gcc.dg/pr50527.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr50527.c Modified: trunk/gcc/testsuite/ChangeLog
patch and test-case checked in, closing PR.
Author: vries Date: Thu Oct 13 11:10:01 2011 New Revision: 179916 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179916 Log: Fix PR middle-end/50527 ChangeLog entry Modified: trunk/gcc/ChangeLog
Author: vries Revision: 179655 Modified property: svn:log Modified: svn:log at Thu Oct 13 11:18:09 2011 ------------------------------------------------------------------------------ --- svn:log (original) +++ svn:log Thu Oct 13 11:18:09 2011 @@ -2,16 +2,33 @@ PR middle-end/50527 * tree.c (build_common_builtin_nodes): Add local_define_builtin for + BUILT_IN_ALLOCA_WITH_ALIGN. Mark that BUILT_IN_ALLOCA_WITH_ALIGN can + throw. * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN + arglist. Set align for BUILT_IN_ALLOCA_WITH_ALIGN. + (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. + (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-ssa-ccp.c (evaluate_stmt): Set align for + BUILT_IN_ALLOCA_WITH_ALIGN. + (fold_builtin_alloca_for_var): Rename to ... + (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd + BUILT_IN_ALLOCA_WITH_ALIGN argument. + (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using + fold_builtin_alloca_with_align. + (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using + DEF_BUILTIN_STUB. * ipa-pure-const.c (special_builtin_state): Handle + BUILT_IN_ALLOCA_WITH_ALIGN. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1) + (call_may_clobber_ref_p_1): Same. * function.c (gimplify_parameters): Lower vla to + BUILT_IN_ALLOCA_WITH_ALIGN. * gimplify.c (gimplify_vla_decl): Same. * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-mudflap.c (mf_xform_statements): Same. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary) + (mark_all_reaching_defs_necessary_1, propagate_necessity): Same. * varasm.c (incorporeal_function_p): Same. * tree-object-size.c (alloc_object_size): Same. * gimple.c (gimple_build_call_from_tree): Same.