[PATCH, PR50527] Don't assume alignment of vla-related allocas.
Tue Oct 4 13:06:00 GMT 2011
On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> I got a patch for PR50527.
>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>> the alloca.
>>>>>> Bootstrapped and regtested on x86_64.
>>>>>> OK for trunk?
>>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>> the alignment requirement of the VLA somewhere.
>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>> point was to make the decl cheaper than the alloca). But that might
>>>>> possibly be the better choice.
>>>>> Any other thoughts?
>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>> large vlas.
>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>> not fold during ccp3.
>>> Ugh, somehow I like this the least ;)
>>> How about lowering VLAs to
>>> p = __builtin_alloca (...);
>>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>> and not assume anything for alloca itself if it feeds a
>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>> that's less awkward to use?
>>> Sorry for not having a clear plan here ;)
>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>> indeed harder to use.
>> Another possibility is to add a 'tree vla_decl' field to struct
>> gimple_statement_call, which is probably the easiest to implement.
>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>> decided to try this one. Attached patch implements my first stab at this (now
>> testing on x86_64).
>> Is this an acceptable approach?
> bootstrapped and reg-tested (including ada) on x86_64.
> Ok for trunk?
The idea is ok I think. But
+ case BUILT_IN_ALLOCA_WITH_ALIGN:
/* If the allocation stems from the declaration of a variable-sized
object, it cannot accumulate. */
target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
+ if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+ == BUILT_IN_ALLOCA_WITH_ALIGN)
+ tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+ 1, CALL_EXPR_ARG (exp, 0));
+ CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+ exp = new_call;
Ick. Why can't the rest of the compiler not handle
BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
(thus, arrange things so the assembler name of alloca-with-align is alloca?)
I don't see why you still need the special late CCP pass.
-fold_builtin_alloca_for_var (gimple stmt)
+builtin_alloca_with_align_p (gimple stmt)
- unsigned HOST_WIDE_INT size, threshold, n_elem;
- tree lhs, arg, block, var, elem_type, array_type;
- unsigned int align;
+ tree fndecl;
+ if (!is_gimple_call (stmt))
+ return false;
+ fndecl = gimple_call_fndecl (stmt);
+ return (fndecl != NULL_TREE
+ && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
+ DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
Now, users might call __builtin_alloca_with_align (8, align), thus use
a non-constant alignment argument. You either need to reject this
in c-family/c-common.c:check_builtin_function_arguments with an
error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
+DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
Oh, I see you are not exposing this to users, so ignore the comments
about non-constant align. Can you move this
DEF down, near the other DEF_BUILTIN_STUBs and add a comment
that this is used for VLAs?
+ build_int_cstu (size_type_node,
+ DECL_ALIGN (parm)));
size_int (DECL_ALIGN (parm)), similar in other places.
Let's see if there are comments from others on this - which I like.
> - Tom
> 2011-10-03 Tom de Vries <email@example.com>
> PR middle-end/50527
> * tree.c (build_common_builtin_nodes): Add local_define_builtin for
> * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
> (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. Expand
> BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
> (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
> * tree-pass.h (pass_ccp_last): Declare.
> * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
> * gimplify.c (gimplify_vla_decl): Lower vla to
> * tree-ssa-ccp.c (cpp_last): Define.
> (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
> (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
> (builtin_alloca_with_align_p): New function.
> (fold_builtin_alloca_with_align_p): New function, factored out of ...
> (fold_builtin_alloca_for_var): Renamed to ...
> (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
> Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
> (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
> (do_ssa_ccp_last): New function.
> (pass_ccp_last): Define.
> (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
> * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
> * ipa-pure-const.c (special_builtin_state): Handle
> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
> (call_may_clobber_ref_p_1): Same.
> function.c (gimplify_parameters): Lower vla to
> cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
> tree-mudflap.c (mf_xform_statements): Same.
> tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
> (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
> varasm.c (incorporeal_function_p): Same.
> tree-object-size.c (alloc_object_size): Same.
> gimple.c (gimple_build_call_from_tree): Same.
> * gcc.dg/pr50527.c: New test.
More information about the Gcc-patches