[PATCH] fold strlen() of aggregate members (PR 77357)

Martin Sebor msebor@gmail.com
Sun Jul 8 02:56:00 GMT 2018


On 07/06/2018 09:52 AM, Richard Biener wrote:
> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> GCC folds accesses to members of constant aggregates except
>> for character arrays/strings.  For example, the strlen() call
>> below is not folded:
>>
>>    const char a[][4] = { "1", "12" };
>>
>>    int f (void) { retturn strlen (a[1]); }
>>
>> The attached change set enhances the string_constant() function
>> to make it possible to extract string constants from aggregate
>> initializers (CONSTRUCTORS).
>>
>> The initial solution was much simpler but as is often the case,
>> MEM_REF made it fail to fold things like:
>>
>>    int f (void) { retturn strlen (a[1] + 1); }
>>
>> Handling those made the project a bit more interesting and
>> the final solution somewhat more involved.
>>
>> To handle offsets into aggregate string members the patch also
>> extends the fold_ctor_reference() function to extract entire
>> string array initializers even if the offset points past
>> the beginning of the string and even though the size and
>> exact type of the reference are not known (there isn't enough
>> information in a MEM_REF to determine that).
>>
>> Tested along with the patch for PR 86415 on x86_64-linux.
>
> +      if (TREE_CODE (init) == CONSTRUCTOR)
> +       {
> +         tree type;
> +         if (TREE_CODE (arg) == ARRAY_REF
> +             || TREE_CODE (arg) == MEM_REF)
> +           type = TREE_TYPE (arg);
> +         else if (TREE_CODE (arg) == COMPONENT_REF)
> +           {
> +             tree field = TREE_OPERAND (arg, 1);
> +             type = TREE_TYPE (field);
> +           }
> +         else
> +           return NULL_TREE;
>
> what's wrong with just
>
>     type = TREE_TYPE (field);

In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.

> ?
>
> +         base_off *= BITS_PER_UNIT;
>
> poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> for poly you'd then use poly_offset?

Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)

>
> You extend fold_ctor_reference to treat size == 0 specially but then
> bother to compute a size here - that looks unneeded?

Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.

>
> While the offset of the reference determines the first field in the
> CONSTRUCTOR, how do you know the access doesn't touch
> adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> so consider
>
>   char x[2][4] = { "abcd", "abcd" };
>
> and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> such MEMs for example.

The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.

>
> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
>        tree byte_offset = DECL_FIELD_OFFSET (cfield);
>        tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
>        tree field_size = DECL_SIZE (cfield);
> -      offset_int bitoffset;
> -      offset_int bitoffset_end, access_end;
> +
> +      if (!field_size && TREE_CODE (cval) == STRING_CST)
> +       {
> +         /* Determine the size of the flexible array member from
> +            the size of the string initializer provided for it.  */
> +         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> +         tree eltype = TREE_TYPE (TREE_TYPE (cval));
> +         len *= tree_to_uhwi (TYPE_SIZE (eltype));
> +         field_size = build_int_cst (size_type_node, len);
> +       }
>
> Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> say, for
>
> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

I can't think of a use for it.  Do you have something in mind?

>
> ?  And why not use simply
>
>   field_size = TYPE_SIZE (TREE_TYPE (cval));
>
> like you do in c_strlen?

Yes, that's simpler, thanks.

>
> Otherwise looks reasonable.

Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)

Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-77357-0.diff
Type: text/x-patch
Size: 39890 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180708/b66b9c1e/attachment.bin>


More information about the Gcc-patches mailing list