[PATCH] Convert character arrays to string csts

Martin Liška mliska@suse.cz
Tue Aug 8 11:47:00 GMT 2017


On 11/09/2016 11:22 AM, Richard Biener wrote:
> On Fri, Nov 4, 2016 at 2:33 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 11/03/2016 02:00 PM, Jan Hubicka wrote:
>>>> On 11/03/2016 01:12 PM, Martin Liška wrote:
>>>>> +  tree init = DECL_INITIAL (decl);
>>>>> +  if (init
>>>>> +      && TREE_READONLY (decl)
>>>>> +      && can_convert_ctor_to_string_cst (init))
>>>>> +    DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
>>>>
>>>> I'd merge these two new functions since they're only ever called
>>>> together. We'd then have something like
>>>>
>>>> if (init && TREE_READONLY (decl))
>>>>   init = convert_ctor_to_string_cst (init);
>>>> if (init)
>>>>   DECL_INITIAL (decl) = init;
>>
>> Done.
>>
>>>>
>>>> I'll defer to Jan on whether finalize_decl seems like a good place
>>>> to do this.
>>>
>>> I think finalize_decl may be bit too early because frontends may expects the
>>> ctors to be in a way they produced them.  We only want to convert those arrays
>>> seen by middle-end so I would move the logic to varpool_node::analyze
>>>
>>> Otherwise the patch seems fine to me.
>>>
>>> Honza
>>>>
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
>>>>> index 283bd1c..b2d1fd5 100644
>>>>> --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
>>>>> @@ -4,12 +4,15 @@
>>>>> char *buffer1;
>>>>> char *buffer2;
>>>>>
>>>>> +const char global[] = {'a', 'b', 'c', 'd', '\0'};
>>>>> +
>>>>> #define SIZE 1000
>>>>>
>>>>> int
>>>>> main (void)
>>>>> {
>>>>>   const char* const foo1 = "hello world";
>>>>> +  const char local[] = "abcd";
>>>>>
>>>>>   buffer1 = __builtin_malloc (SIZE);
>>>>>   __builtin_strcpy (buffer1, foo1);
>>>>> @@ -45,6 +48,10 @@ main (void)
>>>>>     __builtin_abort ();
>>>>>   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
>>>>>     __builtin_abort ();
>>>>> +  if (__builtin_memchr (global, null, 5) == 0)
>>>>> +    __builtin_abort ();
>>>>> +  if (__builtin_memchr (local, null, 5) == 0)
>>>>> +    __builtin_abort ();
>>>>
>>>> How is that a meaningful test? This seems to work even with an
>>>> unpatched gcc. I'd have expected something that shows a benefit for
>>>> doing this conversion, and maybe also a test that shows it isn't
>>>> done in cases where it's not allowed.
>>
>> It's meaningful as it scans that there's no __builtin_memchr in optimized dump.
>> I'm adding new tests that does the opposite test.
>>
>>>>
>>>>> tree
>>>>> -build_string_literal (int len, const char *str)
>>>>> +build_string_literal (int len, const char *str, bool build_addr_expr)
>>>>
>>>> New arguments should be documented in the function comment.
>>
>> Yep, improved.
>>
>>>>
>>>>> +/* Return TRUE when CTOR can be converted to a string constant.  */
>>>>
>>>> "if", not "when".
>>
>> Done.
>>
>>>>
>>>>> +  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
>>>>> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
>>>>> +    {
>>>>> +      if (key == NULL_TREE
>>>>> +     || TREE_CODE (key) != INTEGER_CST
>>>>> +     || !tree_fits_uhwi_p (value)
>>>>> +     || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
>>>>> +   return false;
>>>>
>>>> Shouldn't all elements have the same type, or do you really have to
>>>> call useless_type_conversion in a loop?
>>>>
>>>>> +      /* Allow zero character just at the end of a string.  */
>>>>> +      if (integer_zerop (value))
>>>>> +   return idx == elements - 1;
>>>>
>>>> Don't you also have to explicitly check it's there?
>>>>
>>>>
>>>> Bernd
>>
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> I'm curious about the
> 
> @@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
>         {
>           if (!TREE_STATIC (decl))
>             {
> -             DECL_INITIAL (decl) = NULL_TREE;
> +             if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST)
> +               DECL_INITIAL (decl) = NULL_TREE;
>               init = build2 (INIT_EXPR, void_type_node, decl, init);
>               gimplify_and_add (init, seq_p);
>               ggc_free (init);
> 
> change.  Why keep DECL_INITIAL if you build a INIT_EXPR anyway?
> 
> @@ -4438,6 +4439,19 @@ gimplify_init_constructor (tree *expr_p,
> gimple_seq *pre_p, gimple_seq *post_p,
>             break;
>           }
> 
> +       /* Replace a ctor with a string constant with possible.  */
> +       if (TREE_READONLY (object)
> +           && VAR_P (object))
> +         {
> +           tree string_ctor = convert_ctor_to_string_cst (ctor);
> +           if (string_ctor)
> +             {
> +               TREE_OPERAND (*expr_p, 1) = string_ctor;
> +               DECL_INITIAL (object) = string_ctor;
> +               break;
> +             }
> +         }
> +
>         /* Fetch information about the constructor to direct later processing.
>            We might want to make static versions of it in various cases, and
>            can only do so if it known to be a valid constant initializer.  */
> 
> hmm, so both these hunks will end up keeping a DECL_INITIAL
> for non-static local consts?  Usually we end up with
> 
> main ()
> {
>   const char local[5];
> 
>   <bb 2>:
>   local = "abcd";
> 
> here.  So you keep DECL_INITIAL for folding?
> 
> I believe we should create CONST_DECLs for the above (and make
> CONST_DECLs first-class
> symtab citizens), thus avoid runtime stack initialization for the
> above and instead emit
> "abcd" to the constant pool?

Hi Richi.

I've just returned to this in order to resolve long pending unfinished stuff.
I have couple of questions:

a) what should create a CONST_DECL, a FE or gimplifier when it identifies that
it can be converted to CONST_DECL?

b) Why do we need to put such local variables to symtab?

c) Do we have a target that uses CONST_DECL for such (or similar) purpose?

Thanks,
Martin

> 
> +  /* Skip constructors with a hole.  */
> +  if (CONSTRUCTOR_NELTS (ctor) != ctor_length)
> +    return false;
> 
> not sure if that's maybe too clever ;)
> 
> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
> +    {
> +      if (key == NULL_TREE
> +         || !tree_fits_uhwi_p (key)
> +         || !tree_fits_uhwi_p (value))
> +       return false;
> 
> I think key == NULL is very well valid (you are not using it...).  I'd
> instead do
> 
>      if (key && compare_tree_int (key, idx) != 0)
>        return false;
> 
> for the hole check.  value should always fit uhwi given the earlier
> element type check.
> 
> +tree
> +convert_ctor_to_string_cst (tree ctor)
> +{
> +  if (!can_convert_ctor_to_string_cst (ctor))
> +    return NULL_TREE;
> +
> +  unsigned HOST_WIDE_INT idx;
> +  tree value;
> +  vec<constructor_elt, va_gc> *elts = CONSTRUCTOR_ELTS (ctor);
> +  char *str = XNEWVEC (char, elts->length ());
> +
> +  FOR_EACH_CONSTRUCTOR_VALUE (elts, idx, value)
> +    str[idx] = (char)tree_to_uhwi (value);
> +
> +  tree init = build_string_literal (elts->length (),
> +                                   ggc_alloc_string (str, elts->length ()),
> +                                   false);
> +  free (str);
> 
> why alloc str on the heap, copy it to gc and the copy it again?
> That is, you can pass 'str' to build_string_literal directly I think.
> 
> In fact as you are refactoring build_string_literal please
> refactor build_string into a build_string_raw taking just 'len'
> (thus leaving out the actual string initialization apart from
> 'appending' '\0') and then avoid the heap allocation.
> 
> Refactor build_string_literal to take a tree STRING_CST
> and build_string_literal_addr to build it's address.
> 
> Thanks,
> Richard.
> 
> 
> 
> 
>> Martin
>>



More information about the Gcc-patches mailing list