Bug 50527 - inconsistent vla align
Summary: inconsistent vla align
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Tom de Vries
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-26 14:41 UTC by Tom de Vries
Modified: 2011-10-13 11:18 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-09-27 00:00:00


Attachments
testcase, modified from pr43513.c testcase (251 bytes, text/x-csrc)
2011-09-26 14:41 UTC, Tom de Vries
Details
proposed patch (1.05 KB, text/plain)
2011-09-27 09:23 UTC, Tom de Vries
Details
updated proposed patch (648 bytes, patch)
2011-09-27 13:03 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2011-09-26 14:41:24 UTC
Created attachment 25367 [details]
testcase, modified from pr43513.c testcase

To reproduce on x86_64:
...
$ gcc -Os pr43513-align.c --param large-stack-frame=30
$ ./a.out 
16byte aligned
7fff5c4ce00c
...

The address of the vla is printed, and it's not 16-byte aligned (ends in 'c'). Nevertheless the test whether the address is 16-byte aligned succeeds, and the string '16byte aligned' is printed.

During compilation the following scenario happens:
- During the propagation of the first ccp phase, the align of the alloca (16)
  is progagated to the lhs results.0D.3306_13 as lattice value
  'CONSTANT 0x00000000000000000 (0xfffffffffffffffffffffffffffffff0)'.
- This not propagated through 'D.3307_14 = &*results.0D.3306_13'. The
  propagation does not look at the lattice value of results.0D.3306_13, but at
  the alignment of the ptr_info, which at this point is not initialised yet.
- During the finalize of the first ccp phase, ptr_info of results.0D.3306_13 is
  initialized with align 16, based on the lattice value.
- During the propagation of the second ccp phase, the align of the ptr_info
  of results.0D.3306_13 of 16 is used to propagate through to the comparison
  'if (D.3309_16 == 0)', which makes sure the '16byte aligned' string is
  printed.
- During the finalize of the second ccp phase, the alloca is folded, and
  the new declared array gets an align of 4 bytes.
Comment 1 Richard Biener 2011-09-26 15:00:49 UTC
Hm, I suppose we should then make all replacement decls have BIGGEST_ALIGNMENT
rather than min (BIGGEST_ALIGNMENT, object-size).  Or alternatively
(given we re-compute alignment together with folding alloca), assign
the same alignment as folding would.
Comment 2 Richard Biener 2011-09-26 15:02:23 UTC
The question is of course what standards say about the alignment of
alloca (4).
Comment 3 Tom de Vries 2011-09-27 09:21:12 UTC
> Or alternatively (given we re-compute alignment together with folding alloca),
> assign the same alignment as folding would.

At the point that we determine the alloca alignment during propagation in visit_stmt, we cannot predict whether that alloca will be folded (during the same or later ccp phase).

So the only way to achieve other alignment is to be conservative a bit longer for vla-allocas with respect to alignment:
- keep align at 1 byte during ccp.
- if we fold during ccp, assign align calculated at folding
- after we are sure there is no more folding (at expand, or f.i. at the end of
  the second ccp phase if we limit folding to the first 2 ccp phases, to take
  advantage of the larger alignment in the middle-end), we assign
  BIGGEST_ALIGNMENT.

> The question is of course what standards say about the alignment of
> alloca (4)

I think alloca is non-standard. But in the context of fold_builtin_alloca_for_var, alloca is the implementation vehicle of vlas, so the question is what the standard says about alignment of vlas.
Comment 4 Tom de Vries 2011-09-27 09:23:21 UTC
Created attachment 25368 [details]
proposed patch

> Hm, I suppose we should then make all replacement decls have 
> BIGGEST_ALIGNMENT rather than min (BIGGEST_ALIGNMENT, object-size)

Currently testing this patch on x86_64.

2011-09-27  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-ccp.c (fold_builtin_alloca_for_var): Use align from ptr_info.

	* gcc.dg/pr50527.c: New test.
Comment 5 rguenther@suse.de 2011-09-27 09:28:42 UTC
On Tue, 27 Sep 2011, vries at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50527
> 
> --- Comment #3 from vries at gcc dot gnu.org 2011-09-27 09:21:12 UTC ---
> > Or alternatively (given we re-compute alignment together with folding alloca),
> > assign the same alignment as folding would.
> 
> At the point that we determine the alloca alignment during propagation in
> visit_stmt, we cannot predict whether that alloca will be folded (during the
> same or later ccp phase).
> 
> So the only way to achieve other alignment is to be conservative a bit longer
> for vla-allocas with respect to alignment:
> - keep align at 1 byte during ccp.
> - if we fold during ccp, assign align calculated at folding
> - after we are sure there is no more folding (at expand, or f.i. at the end of
>   the second ccp phase if we limit folding to the first 2 ccp phases, to take
>   advantage of the larger alignment in the middle-end), we assign
>   BIGGEST_ALIGNMENT.

I think we can check if the size is constant in evaluate_stmt and
compute alignment according to that.  It should only change from
non-constant to constant, thus properly go down the lattice during
propagation.

We don't want to force excessive alignment on the replacement decls
as that might require re-aligning the stack which is expensive.

> > The question is of course what standards say about the alignment of
> > alloca (4)
> 
> I think alloca is non-standard. But in the context of
> fold_builtin_alloca_for_var, alloca is the implementation vehicle of vlas, so
> the question is what the standard says about alignment of vlas.

Indeed.
Comment 6 Tom de Vries 2011-09-27 10:49:23 UTC
> I think we can check if the size is constant in evaluate_stmt and
> compute alignment according to that.  

We can only do that in the last ccp phase that does folding of vla-alllocas.

If the argument is not constant, it will not be folded in this phase, but it might be folded during the next ccp phase, when the argument does turn constant.

If the argument is constant, it might not be folded in this phase, but it still might be folded during the next ccp phase.

Therefore, in evaluate_stmt, we cannot predict whether the alloca will be folded, unless we're in the last ccp phase. And the propagation of alignment of alloca starts in the first ccp phase.

> It should only change from
> non-constant to constant, thus properly go down the lattice during
> propagation.

Currently, the result of an alloca is always constant, to be precise, constant 0 with only lower bits valid. This is independent of whether the argument is constant.
Comment 7 rguenther@suse.de 2011-09-27 11:08:01 UTC
On Tue, 27 Sep 2011, vries at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50527
> 
> --- Comment #6 from vries at gcc dot gnu.org 2011-09-27 10:49:23 UTC ---
> > I think we can check if the size is constant in evaluate_stmt and
> > compute alignment according to that.  
> 
> We can only do that in the last ccp phase that does folding of vla-alllocas.
> 
> If the argument is not constant, it will not be folded in this phase, but it
> might be folded during the next ccp phase, when the argument does turn
> constant.
> 
> If the argument is constant, it might not be folded in this phase, but it still
> might be folded during the next ccp phase.
> 
> Therefore, in evaluate_stmt, we cannot predict whether the alloca will be
> folded, unless we're in the last ccp phase. And the propagation of alignment of
> alloca starts in the first ccp phase.
> 
> > It should only change from
> > non-constant to constant, thus properly go down the lattice during
> > propagation.
> 
> Currently, the result of an alloca is always constant, to be precise, constant
> 0 with only lower bits valid. This is independent of whether the argument is
> constant.

The parameter I meant.  But yes if we don't fold alloca in ccp1
we might fold away alignment tests based on BIGGEST_ALIGNMENT while
later ccp might fold it and use less alignment.  Maybe don't assume
any particular alignment for allocas for vlas then.

Richard.
Comment 8 Tom de Vries 2011-09-27 13:03:55 UTC
Created attachment 25371 [details]
updated proposed patch

> Maybe don't assume any particular alignment for allocas for vlas then.

Updated patch accordingly, now testing on x86_64.

2011-09-27  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related
	allocas.

	* gcc.dg/pr50527.c: New test.
Comment 9 Tom de Vries 2011-10-07 12:49:54 UTC
Author: vries
Date: Fri Oct  7 12:49:49 2011
New Revision: 179655

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179655
Log:
2011-10-07  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50527
	* tree.c (build_common_builtin_nodes): Add local_define_builtin for
	* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
	* tree-ssa-ccp.c (evaluate_stmt): Set align for
	* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
	* ipa-pure-const.c (special_builtin_state): Handle
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
	* function.c (gimplify_parameters): Lower vla to
	* gimplify.c (gimplify_vla_decl): Same.
	* cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-mudflap.c (mf_xform_statements): Same.
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
	* varasm.c (incorporeal_function_p): Same.
	* tree-object-size.c (alloc_object_size): Same.
	* gimple.c (gimple_build_call_from_tree): Same.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/builtins.def
    trunk/gcc/cfgexpand.c
    trunk/gcc/function.c
    trunk/gcc/gimple.c
    trunk/gcc/gimplify.c
    trunk/gcc/ipa-pure-const.c
    trunk/gcc/tree-mudflap.c
    trunk/gcc/tree-object-size.c
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-ccp.c
    trunk/gcc/tree-ssa-dce.c
    trunk/gcc/tree.c
    trunk/gcc/varasm.c
Comment 10 Tom de Vries 2011-10-07 12:50:00 UTC
Author: vries
Date: Fri Oct  7 12:49:56 2011
New Revision: 179656

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179656
Log:
2011-10-07  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50527
	* gcc.dg/pr50527.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr50527.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 11 Tom de Vries 2011-10-07 13:38:02 UTC
patch and test-case checked in, closing PR.
Comment 12 Tom de Vries 2011-10-13 11:10:06 UTC
Author: vries
Date: Thu Oct 13 11:10:01 2011
New Revision: 179916

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179916
Log:
Fix PR middle-end/50527 ChangeLog entry

Modified:
    trunk/gcc/ChangeLog
Comment 13 Tom de Vries 2011-10-13 11:18:14 UTC
Author: vries
Revision: 179655
Modified property: svn:log

Modified: svn:log at Thu Oct 13 11:18:09 2011
------------------------------------------------------------------------------
--- svn:log (original)
+++ svn:log Thu Oct 13 11:18:09 2011
@@ -2,16 +2,33 @@
 
 	PR middle-end/50527
 	* tree.c (build_common_builtin_nodes): Add local_define_builtin for
+	BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
+	throw.
 	* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
+	arglist.  Set align for	BUILT_IN_ALLOCA_WITH_ALIGN.
+	(expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
+	(is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
 	* tree-ssa-ccp.c (evaluate_stmt): Set align for
+	BUILT_IN_ALLOCA_WITH_ALIGN.
+	(fold_builtin_alloca_for_var): Rename to ...
+	(fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
+	BUILT_IN_ALLOCA_WITH_ALIGN argument.
+	(ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
+	fold_builtin_alloca_with_align.
+	(optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
 	* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
+	DEF_BUILTIN_STUB.
 	* ipa-pure-const.c (special_builtin_state): Handle
+	BUILT_IN_ALLOCA_WITH_ALIGN.
 	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
+	(call_may_clobber_ref_p_1): Same.
 	* function.c (gimplify_parameters): Lower vla to
+	BUILT_IN_ALLOCA_WITH_ALIGN.
 	* gimplify.c (gimplify_vla_decl): Same.
 	* cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
 	* tree-mudflap.c (mf_xform_statements): Same.
 	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
+	(mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
 	* varasm.c (incorporeal_function_p): Same.
 	* tree-object-size.c (alloc_object_size): Same.
 	* gimple.c (gimple_build_call_from_tree): Same.