[PATCH, PR50527] Don't assume alignment of vla-related allocas.

Thu Oct 6 10:08:00 GMT 2011

On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/05/2011 10:46 AM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I got a patch for PR50527.
>>>>>>>>>>
>>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>>>> the alloca.
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>>>
>>>>>>>>>> OK for trunk?
>>>>>>>>>
>>>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>>>
>>>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>>>
>>>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>>>> possibly be the better choice.
>>>>>>>>>
>>>>>>>>> Any other thoughts?
>>>>>>>>
>>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>>>> large vlas.
>>>>>>>>
>>>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>>>
>>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>>>
>>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>>>> not fold during ccp3.
>>>>>>>
>>>>>>> Ugh, somehow I like this the least ;)
>>>>>>>
>>>>>>> How about lowering VLAs to
>>>>>>>
>>>>>>>   p = __builtin_alloca (...);
>>>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>>>
>>>>>>> and not assume anything for alloca itself if it feeds a
>>>>>>> __builtin_assume_aligned?
>>>>>>>
>>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>>>
>>>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>>>
>>>>>>> that's less awkward to use?
>>>>>>>
>>>>>>> Sorry for not having a clear plan here ;)
>>>>>>>
>>>>>>
>>>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>>>> indeed harder to use.
>>>>>>
>>>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>>>
>>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>>>> testing on x86_64).
>>>>>>
>>>>>> Is this an acceptable approach?
>>>>>>
>>>>>
>>>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>>>
>>>>> Ok for trunk?
>>>>
>>>> The idea is ok I think.  But
>>>>
>>>>      case BUILT_IN_ALLOCA:
>>>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>        /* If the allocation stems from the declaration of a variable-sized
>>>>          object, it cannot accumulate.  */
>>>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>>        if (target)
>>>>         return target;
>>>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>>>> +       {
>>>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>>>> +                                              1, CALL_EXPR_ARG (exp, 0));
>>>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>>> +         exp = new_call;
>>>> +       }
>>>>
>>>> Ick.  Why can't the rest of the compiler not handle
>>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>>
>>>
>>> We can set the assembler name in the local_define_builtin call. But that will
>>> still create a call alloca (12, 4). How do we deal with the second argument?
>>>
>>>> I don't see why you still need the special late CCP pass.
>>>>
>>>
>>> For alloca_with_align, the align will minimally be the 2nd argument. This is
>>> independent of folding, and we can propagate this information in every ccp.
>>>
>>> If the alloca_with_align is not folded and will not be folded anymore (something
>>> we know at the earliest after the propagation phase of the last ccp),
>>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
>>> normal allloca. We try to exploit this by improving the ptr_info->align.
>>
>> I'd just assume the (lower) constant alignment of the argument always.
>>
>>> Well, that was the idea. But now I wonder, isn't it better to do this in
>>> expand_builtin_alloca:
>>> ...
>>>   /* Compute the argument.  */
>>>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>>
>>> +  align =
>>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>>> +    : BIGGEST_ALIGNMENT;
>>> +
>>> +
>>>   /* Allocate the desired space.  */
>>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>>> -                                        cannot_accumulate);
>>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>>>   result = convert_memory_address (ptr_mode, result);
>>
>> Yes, that's probably a good idea anyway - it will reduce excess
>> stack re-alignment for VLAs.
>>
>>>   return result;
>>>
>>> ...
>>>
>>> This way, we lower the alignment requirement for vlas in general, not only when
>>> folding (unless we expand to an actual alloca call).
>>>
>>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
>>> until expand, so we can't exploit that in the middle-end, so we lose the need
>>> for ccp_last.
>>
>> Good ;)
>>
>
> Handled this way now.
>
>> Thanks,
>> Richard.
>>
>>>> -static tree
>>>> -fold_builtin_alloca_for_var (gimple stmt)
>>>> +static bool
>>>> +builtin_alloca_with_align_p (gimple stmt)
>>>>  {
>>>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>>>> -  tree lhs, arg, block, var, elem_type, array_type;
>>>> -  unsigned int align;
>>>> +  tree fndecl;
>>>> +  if (!is_gimple_call (stmt))
>>>> +    return false;
>>>> +
>>>> +  fndecl = gimple_call_fndecl (stmt);
>>>> +
>>>> +  return (fndecl != NULL_TREE
>>>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>>>> +}
>>>>
>>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>>>
>>>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>>>
>>>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>>>> a non-constant alignment argument.  You either need to reject this
>>>> in c-family/c-common.c:check_builtin_function_arguments with an
>>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>>>> I suppose).
>>>>
>>>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>>>
>>>> Oh, I see you are not exposing this to users, so ignore the comments
>>>> about non-constant align.
>
>>>> Can you move this
>>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>>>> that this is used for VLAs?
>
> Done.
>
>>>>
>>>> +                                      build_int_cstu (size_type_node,
>>>> +                                                      DECL_ALIGN (parm)));
>>>>
>>>> size_int (DECL_ALIGN (parm)), similar in other places.
>>>>
>
> Done.
>
>>>> Let's see if there are comments from others on this - which I like.
>>>>
>>>
>>> Thanks,
>>> - Tom
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>
>>>
>
> Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as
> discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html.
>
> Bootstrapped and regtested (including ada) on x86_64.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-10-05  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/50527
>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>        BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
>        throw.
>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>        arglist.  Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-ccp.c (evaluate_stmt): Set align for
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        (fold_builtin_alloca_for_var): Rename to ...
>        (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
>        BUILT_IN_ALLOCA_WITH_ALIGN argument.
>        (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
>        fold_builtin_alloca_with_align.
>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>        DEF_BUILTIN_STUB.
>        * ipa-pure-const.c (special_builtin_state): Handle
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
>        (call_may_clobber_ref_p_1): Same.
>        * function.c (gimplify_parameters): Lower vla to
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * gimplify.c (gimplify_vla_decl): Same.
>        * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-mudflap.c (mf_xform_statements): Same.
>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>        * varasm.c (incorporeal_function_p): Same.
>        * tree-object-size.c (alloc_object_size): Same.
>        * gimple.c (gimple_build_call_from_tree): Same.
>
>        * gcc.dg/pr50527.c: New test.
>