[PATCH, PR50527] Don't assume alignment of vla-related allocas.
Richard Guenther
richard.guenther@gmail.com
Wed Oct 5 08:54:00 GMT 2011
On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I got a patch for PR50527.
>>>>>>>>
>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>> the alloca.
>>>>>>>>
>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>
>>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>
>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>
>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>> point was to make the decl cheaper than the alloca). But that might
>>>>>>> possibly be the better choice.
>>>>>>>
>>>>>>> Any other thoughts?
>>>>>>
>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>> large vlas.
>>>>>>
>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>
>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>
>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>> not fold during ccp3.
>>>>>
>>>>> Ugh, somehow I like this the least ;)
>>>>>
>>>>> How about lowering VLAs to
>>>>>
>>>>> p = __builtin_alloca (...);
>>>>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>
>>>>> and not assume anything for alloca itself if it feeds a
>>>>> __builtin_assume_aligned?
>>>>>
>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>
>>>>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>
>>>>> that's less awkward to use?
>>>>>
>>>>> Sorry for not having a clear plan here ;)
>>>>>
>>>>
>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>> indeed harder to use.
>>>>
>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>
>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>> decided to try this one. Attached patch implements my first stab at this (now
>>>> testing on x86_64).
>>>>
>>>> Is this an acceptable approach?
>>>>
>>>
>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>
>>> Ok for trunk?
>>
>> The idea is ok I think. But
>>
>> case BUILT_IN_ALLOCA:
>> + case BUILT_IN_ALLOCA_WITH_ALIGN:
>> /* If the allocation stems from the declaration of a variable-sized
>> object, it cannot accumulate. */
>> target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>> if (target)
>> return target;
>> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>> + == BUILT_IN_ALLOCA_WITH_ALIGN)
>> + {
>> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>> + built_in_decls[BUILT_IN_ALLOCA],
>> + 1, CALL_EXPR_ARG (exp, 0));
>> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>> + exp = new_call;
>> + }
>>
>> Ick. Why can't the rest of the compiler not handle
>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>
>
> We can set the assembler name in the local_define_builtin call. But that will
> still create a call alloca (12, 4). How do we deal with the second argument?
>
>> I don't see why you still need the special late CCP pass.
>>
>
> For alloca_with_align, the align will minimally be the 2nd argument. This is
> independent of folding, and we can propagate this information in every ccp.
>
> If the alloca_with_align is not folded and will not be folded anymore (something
> we know at the earliest after the propagation phase of the last ccp),
> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
> normal allloca. We try to exploit this by improving the ptr_info->align.
I'd just assume the (lower) constant alignment of the argument always.
> Well, that was the idea. But now I wonder, isn't it better to do this in
> expand_builtin_alloca:
> ...
> /* Compute the argument. */
> op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>
> + align =
> + (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
> + BUILT_IN_ALLOCA_WITH_ALIGN)
> + ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
> + : BIGGEST_ALIGNMENT;
> +
> +
> /* Allocate the desired space. */
> - result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
> - cannot_accumulate);
> + result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
> result = convert_memory_address (ptr_mode, result);
Yes, that's probably a good idea anyway - it will reduce excess
stack re-alignment for VLAs.
> return result;
>
> ...
>
> This way, we lower the alignment requirement for vlas in general, not only when
> folding (unless we expand to an actual alloca call).
>
> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
> until expand, so we can't exploit that in the middle-end, so we lose the need
> for ccp_last.
Good ;)
Thanks,
Richard.
>> -static tree
>> -fold_builtin_alloca_for_var (gimple stmt)
>> +static bool
>> +builtin_alloca_with_align_p (gimple stmt)
>> {
>> - unsigned HOST_WIDE_INT size, threshold, n_elem;
>> - tree lhs, arg, block, var, elem_type, array_type;
>> - unsigned int align;
>> + tree fndecl;
>> + if (!is_gimple_call (stmt))
>> + return false;
>> +
>> + fndecl = gimple_call_fndecl (stmt);
>> +
>> + return (fndecl != NULL_TREE
>> + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>> +}
>>
>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>
>> + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>
>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>> a non-constant alignment argument. You either need to reject this
>> in c-family/c-common.c:check_builtin_function_arguments with an
>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>> I suppose).
>>
>> +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>
>> Oh, I see you are not exposing this to users, so ignore the comments
>> about non-constant align. Can you move this
>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>> that this is used for VLAs?
>>
>> + build_int_cstu (size_type_node,
>> + DECL_ALIGN (parm)));
>>
>> size_int (DECL_ALIGN (parm)), similar in other places.
>>
>> Let's see if there are comments from others on this - which I like.
>>
>
> Thanks,
> - Tom
>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> - Tom
>>>
>
>
>>> 2011-10-03 Tom de Vries <tom@codesourcery.com>
>>>
>>> PR middle-end/50527
>>> * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>>> BUILT_IN_ALLOCA_WITH_ALIGN.
>>> * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>>> arglist.
>>> (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. Expand
>>> BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
>>> (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>> * tree-pass.h (pass_ccp_last): Declare.
>>> * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
>>> pass_ccp_last.
>>> * gimplify.c (gimplify_vla_decl): Lower vla to
>>> BUILT_IN_ALLOCA_WITH_ALIGN.
>>> * tree-ssa-ccp.c (cpp_last): Define.
>>> (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
>>> (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>>> (builtin_alloca_with_align_p): New function.
>>> (fold_builtin_alloca_with_align_p): New function, factored out of ...
>>> (fold_builtin_alloca_for_var): Renamed to ...
>>> (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
>>> Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
>>> (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
>>> builtin_alloca_with_align_p.
>>> (do_ssa_ccp_last): New function.
>>> (pass_ccp_last): Define.
>>> (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>> * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>>> DEF_BUILTIN_STUB.
>>> * ipa-pure-const.c (special_builtin_state): Handle
>>> BUILT_IN_ALLOCA_WITH_ALIGN.
>>> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
>>> (call_may_clobber_ref_p_1): Same.
>>> function.c (gimplify_parameters): Lower vla to
>>> BUILT_IN_ALLOCA_WITH_ALIGN.
>>> cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>> tree-mudflap.c (mf_xform_statements): Same.
>>> tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>>> (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>>> varasm.c (incorporeal_function_p): Same.
>>> tree-object-size.c (alloc_object_size): Same.
>>> gimple.c (gimple_build_call_from_tree): Same.
>>>
>>> * gcc.dg/pr50527.c: New test.
>>>
>>>
>>>
>
>
More information about the Gcc-patches
mailing list