This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.


On 10/04/2011 03:03 PM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I got a patch for PR50527.
>>>>>>>
>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>> the alloca.
>>>>>>>
>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>
>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>
>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>
>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>> possibly be the better choice.
>>>>>>
>>>>>> Any other thoughts?
>>>>>
>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>> large vlas.
>>>>>
>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>
>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>
>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>> not fold during ccp3.
>>>>
>>>> Ugh, somehow I like this the least ;)
>>>>
>>>> How about lowering VLAs to
>>>>
>>>>   p = __builtin_alloca (...);
>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>
>>>> and not assume anything for alloca itself if it feeds a
>>>> __builtin_assume_aligned?
>>>>
>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>
>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>
>>>> that's less awkward to use?
>>>>
>>>> Sorry for not having a clear plan here ;)
>>>>
>>>
>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>> indeed harder to use.
>>>
>>> Another possibility is to add a 'tree vla_decl' field to struct
>>> gimple_statement_call, which is probably the easiest to implement.
>>>
>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>> decided to try this one. Attached patch implements my first stab at this  (now
>>> testing on x86_64).
>>>
>>> Is this an acceptable approach?
>>>
>>
>> bootstrapped and reg-tested (including ada) on x86_64.
>>
>> Ok for trunk?
> 
> The idea is ok I think.  But
> 
>      case BUILT_IN_ALLOCA:
> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>        /* If the allocation stems from the declaration of a variable-sized
>          object, it cannot accumulate.  */
>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>        if (target)
>         return target;
> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
> +       {
> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
> +                                              built_in_decls[BUILT_IN_ALLOCA],
> +                                              1, CALL_EXPR_ARG (exp, 0));
> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
> +         exp = new_call;
> +       }
> 
> Ick.  Why can't the rest of the compiler not handle
> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
> 

We can set the assembler name in the local_define_builtin call. But that will
still create a call alloca (12, 4). How do we deal with the second argument?

> I don't see why you still need the special late CCP pass.
> 

For alloca_with_align, the align will minimally be the 2nd argument. This is
independent of folding, and we can propagate this information in every ccp.

If the alloca_with_align is not folded and will not be folded anymore (something
we know at the earliest after the propagation phase of the last ccp),
the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
normal allloca. We try to exploit this by improving the ptr_info->align.

Well, that was the idea. But now I wonder, isn't it better to do this in
expand_builtin_alloca:
...
   /* Compute the argument.  */
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));

+  align =
+    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
+     BUILT_IN_ALLOCA_WITH_ALIGN)
+    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
+    : BIGGEST_ALIGNMENT;
+
+
   /* Allocate the desired space.  */
-  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
-					 cannot_accumulate);
+  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
   result = convert_memory_address (ptr_mode, result);

   return result;

...

This way, we lower the alignment requirement for vlas in general, not only when
folding (unless we expand to an actual alloca call).

If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
until expand, so we can't exploit that in the middle-end, so we lose the need
for ccp_last.

> -static tree
> -fold_builtin_alloca_for_var (gimple stmt)
> +static bool
> +builtin_alloca_with_align_p (gimple stmt)
>  {
> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
> -  tree lhs, arg, block, var, elem_type, array_type;
> -  unsigned int align;
> +  tree fndecl;
> +  if (!is_gimple_call (stmt))
> +    return false;
> +
> +  fndecl = gimple_call_fndecl (stmt);
> +
> +  return (fndecl != NULL_TREE
> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
> +}
> 
> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
> 
> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> 
> Now, users might call __builtin_alloca_with_align (8, align), thus use
> a non-constant alignment argument.  You either need to reject this
> in c-family/c-common.c:check_builtin_function_arguments with an
> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
> I suppose).
> 
> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
> 
> Oh, I see you are not exposing this to users, so ignore the comments
> about non-constant align.  Can you move this
> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
> that this is used for VLAs?
> 
> +                                      build_int_cstu (size_type_node,
> +                                                      DECL_ALIGN (parm)));
> 
> size_int (DECL_ALIGN (parm)), similar in other places.
> 
> Let's see if there are comments from others on this - which I like.
> 

Thanks,
- Tom

> Thanks,
> Richard.
> 
>> Thanks,
>> - Tom
>>


>> 2011-10-03  Tom de Vries  <tom@codesourcery.com>
>>
>>        PR middle-end/50527
>>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>>        arglist.
>>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
>>        BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
>>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-pass.h (pass_ccp_last): Declare.
>>        * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
>>        pass_ccp_last.
>>        * gimplify.c (gimplify_vla_decl): Lower vla to
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-ssa-ccp.c (cpp_last): Define.
>>        (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
>>        (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>>        (builtin_alloca_with_align_p): New function.
>>        (fold_builtin_alloca_with_align_p): New function, factored out of ...
>>        (fold_builtin_alloca_for_var): Renamed to ...
>>        (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
>>        Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
>>        (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
>>        builtin_alloca_with_align_p.
>>        (do_ssa_ccp_last): New function.
>>        (pass_ccp_last): Define.
>>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>>        DEF_BUILTIN_STUB.
>>        * ipa-pure-const.c (special_builtin_state): Handle
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
>>        (call_may_clobber_ref_p_1): Same.
>>        function.c (gimplify_parameters): Lower vla to
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        tree-mudflap.c (mf_xform_statements): Same.
>>        tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>>        varasm.c (incorporeal_function_p): Same.
>>        tree-object-size.c (alloc_object_size): Same.
>>        gimple.c (gimple_build_call_from_tree): Same.
>>
>>        * gcc.dg/pr50527.c: New test.
>>
>>
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]