This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fold strlen() of aggregate members (PR 77357)


On 09/07/18 22:44, Martin Sebor wrote:
> On 07/09/2018 06:40 AM, Richard Biener wrote:
>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 07/06/2018 09:52 AM, Richard Biener wrote:
>>>> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>>
>>>>> GCC folds accesses to members of constant aggregates except
>>>>> for character arrays/strings.  For example, the strlen() call
>>>>> below is not folded:
>>>>>
>>>>>    const char a[][4] = { "1", "12" };
>>>>>
>>>>>    int f (void) { retturn strlen (a[1]); }
>>>>>
>>>>> The attached change set enhances the string_constant() function
>>>>> to make it possible to extract string constants from aggregate
>>>>> initializers (CONSTRUCTORS).
>>>>>
>>>>> The initial solution was much simpler but as is often the case,
>>>>> MEM_REF made it fail to fold things like:
>>>>>
>>>>>    int f (void) { retturn strlen (a[1] + 1); }
>>>>>
>>>>> Handling those made the project a bit more interesting and
>>>>> the final solution somewhat more involved.
>>>>>
>>>>> To handle offsets into aggregate string members the patch also
>>>>> extends the fold_ctor_reference() function to extract entire
>>>>> string array initializers even if the offset points past
>>>>> the beginning of the string and even though the size and
>>>>> exact type of the reference are not known (there isn't enough
>>>>> information in a MEM_REF to determine that).
>>>>>
>>>>> Tested along with the patch for PR 86415 on x86_64-linux.
>>>>
>>>> +      if (TREE_CODE (init) == CONSTRUCTOR)
>>>> +       {
>>>> +         tree type;
>>>> +         if (TREE_CODE (arg) == ARRAY_REF
>>>> +             || TREE_CODE (arg) == MEM_REF)
>>>> +           type = TREE_TYPE (arg);
>>>> +         else if (TREE_CODE (arg) == COMPONENT_REF)
>>>> +           {
>>>> +             tree field = TREE_OPERAND (arg, 1);
>>>> +             type = TREE_TYPE (field);
>>>> +           }
>>>> +         else
>>>> +           return NULL_TREE;
>>>>
>>>> what's wrong with just
>>>>
>>>>     type = TREE_TYPE (field);
>>>
>>> In response to your comment below abut size I simplified things
>>> further so determining the type a priori is no longer necessary.
>>>
>>>> ?
>>>>
>>>> +         base_off *= BITS_PER_UNIT;
>>>>
>>>> poly_uint64 isn't enough for "bits", with wide-int you'd use
>>>> offset_int,
>>>> for poly you'd then use poly_offset?
>>>
>>> Okay, I tried to avoid the overflow.  (Converting between all
>>> these flavors of wide int types is a monumental PITA.)
>>>
>>>>
>>>> You extend fold_ctor_reference to treat size == 0 specially but then
>>>> bother to compute a size here - that looks unneeded?
>>>
>>> Yes, well spotted, thanks!  I simplified the code so this isn't
>>> necessary, and neither is the type.
>>>
>>>>
>>>> While the offset of the reference determines the first field in the
>>>> CONSTRUCTOR, how do you know the access doesn't touch
>>>> adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
>>>> so consider
>>>>
>>>>   char x[2][4] = { "abcd", "abcd" };
>>>>
>>>> and MEM[&x] with a char[8] type?  memcpy "inlining" will create
>>>> such MEMs for example.
>>>
>>> The code is only used to find string constants in initializer
>>> expressions where I don't think the size of the access comes
>>> into play.  If a memcpy() call results in a MEM_REF[char[8],
>>> &x, 8] that's fine.  It's a valid reference and we can still
>>> get the underlying character sequence (which is represented
>>> as two STRING_CSTs with the two string literals).  I might
>>> be missing the point of your question.
>>
>> Maybe irrelevant for strlen folding depending on what you do
>> for missing '\0' termination.
>>
>>>>
>>>> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
>>>> ctor,
>>>>        tree byte_offset = DECL_FIELD_OFFSET (cfield);
>>>>        tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
>>>>        tree field_size = DECL_SIZE (cfield);
>>>> -      offset_int bitoffset;
>>>> -      offset_int bitoffset_end, access_end;
>>>> +
>>>> +      if (!field_size && TREE_CODE (cval) == STRING_CST)
>>>> +       {
>>>> +         /* Determine the size of the flexible array member from
>>>> +            the size of the string initializer provided for it.  */
>>>> +         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
>>>> +         tree eltype = TREE_TYPE (TREE_TYPE (cval));
>>>> +         len *= tree_to_uhwi (TYPE_SIZE (eltype));
>>>> +         field_size = build_int_cst (size_type_node, len);
>>>> +       }
>>>>
>>>> Why does this only apply to STRING_CST initializers and not
>>>> CONSTRUCTORS,
>>>> say, for
>>>>
>>>> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>>>
>>> I can't think of a use for it.  Do you have something in mind?
>>
>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
>> which is useful in other parts of the compiler.  So I don't see why
>> it shouldn't work for general flex-arrays.
>>
>>>>
>>>> ?  And why not use simply
>>>>
>>>>   field_size = TYPE_SIZE (TREE_TYPE (cval));
>>>>
>>>> like you do in c_strlen?
>>>
>>> Yes, that's simpler, thanks.
>>>
>>>>
>>>> Otherwise looks reasonable.
>>>
>>> Attached is an updated patch.  I also enhanced the handling
>>> of non-constant indices.  They were already handled before
>>> to a smaller extent.  (There may be other opportunities
>>> here.)
>>
>> Please don't do functional changes to a patch in review, without
>> exactly pointing out the change.  It makes review inefficent for me.
>>
>> Looks like it might be the NULL type argument handling?
> 
> Sorry.  The change I was referring to is the addition and handling
> of the varidx variable to string_constant.  It was necessary for
> parity with the existing optimization for non-constant array
> indices.  It makes it possible to fold strlen calls like:
> 
>   const char a[][3] = { "", "1", "12" };
> 
>   void f (int i)
>   {
>     if (__builtin_strlen (&a[2][i]) > 2)
>       __builtin_abort ();
>   }
> 
> Prior to the patch GCC could that for one-dimensional arrays
> so to my mind it would have been an oversight for a patch to
> handle multi-dimensional arrays not to do the same thing for
> those.
> 
>> +
>> +      if (!field_size && TREE_CODE (cval) == STRING_CST)
>> +       {
>> +         /* Determine the size of the flexible array member from
>> +            the size of the string initializer provided for it.  */
>> +         /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */
>> +         /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */
>> +         /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */
>> +         /* field_size = build_int_cst (size_type_node, len); */
>> +         field_size = TYPE_SIZE (TREE_TYPE (cval));
>>
>> why all the commented code?
>>
>> OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR
>> handled at that point.
> 
> Done in r262522.
> 
> Thanks.
> Martin
Hi Martin,

I believe this caused the following regresion on aarch64 and arm:
FAIL: gcc.c-torture/execute/builtins/strlen-3.c execution, -Og -g

I haven't quite looked into why though, Ill have a closer look.

Cheers,
Andre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]