[PATCH] fold strlen() of aggregate members (PR 77357)
Martin Sebor
msebor@gmail.com
Sun Jul 8 02:56:00 GMT 2018
On 07/06/2018 09:52 AM, Richard Biener wrote:
> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> GCC folds accesses to members of constant aggregates except
>> for character arrays/strings. For example, the strlen() call
>> below is not folded:
>>
>> const char a[][4] = { "1", "12" };
>>
>> int f (void) { retturn strlen (a[1]); }
>>
>> The attached change set enhances the string_constant() function
>> to make it possible to extract string constants from aggregate
>> initializers (CONSTRUCTORS).
>>
>> The initial solution was much simpler but as is often the case,
>> MEM_REF made it fail to fold things like:
>>
>> int f (void) { retturn strlen (a[1] + 1); }
>>
>> Handling those made the project a bit more interesting and
>> the final solution somewhat more involved.
>>
>> To handle offsets into aggregate string members the patch also
>> extends the fold_ctor_reference() function to extract entire
>> string array initializers even if the offset points past
>> the beginning of the string and even though the size and
>> exact type of the reference are not known (there isn't enough
>> information in a MEM_REF to determine that).
>>
>> Tested along with the patch for PR 86415 on x86_64-linux.
>
> + if (TREE_CODE (init) == CONSTRUCTOR)
> + {
> + tree type;
> + if (TREE_CODE (arg) == ARRAY_REF
> + || TREE_CODE (arg) == MEM_REF)
> + type = TREE_TYPE (arg);
> + else if (TREE_CODE (arg) == COMPONENT_REF)
> + {
> + tree field = TREE_OPERAND (arg, 1);
> + type = TREE_TYPE (field);
> + }
> + else
> + return NULL_TREE;
>
> what's wrong with just
>
> type = TREE_TYPE (field);
In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.
> ?
>
> + base_off *= BITS_PER_UNIT;
>
> poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> for poly you'd then use poly_offset?
Okay, I tried to avoid the overflow. (Converting between all
these flavors of wide int types is a monumental PITA.)
>
> You extend fold_ctor_reference to treat size == 0 specially but then
> bother to compute a size here - that looks unneeded?
Yes, well spotted, thanks! I simplified the code so this isn't
necessary, and neither is the type.
>
> While the offset of the reference determines the first field in the
> CONSTRUCTOR, how do you know the access doesn't touch
> adjacent ones? STRING_CSTs do not have to be '\0' terminated,
> so consider
>
> char x[2][4] = { "abcd", "abcd" };
>
> and MEM[&x] with a char[8] type? memcpy "inlining" will create
> such MEMs for example.
The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play. If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine. It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals). I might
be missing the point of your question.
>
> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
> tree byte_offset = DECL_FIELD_OFFSET (cfield);
> tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
> tree field_size = DECL_SIZE (cfield);
> - offset_int bitoffset;
> - offset_int bitoffset_end, access_end;
> +
> + if (!field_size && TREE_CODE (cval) == STRING_CST)
> + {
> + /* Determine the size of the flexible array member from
> + the size of the string initializer provided for it. */
> + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> + field_size = build_int_cst (size_type_node, len);
> + }
>
> Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> say, for
>
> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
I can't think of a use for it. Do you have something in mind?
>
> ? And why not use simply
>
> field_size = TYPE_SIZE (TREE_TYPE (cval));
>
> like you do in c_strlen?
Yes, that's simpler, thanks.
>
> Otherwise looks reasonable.
Attached is an updated patch. I also enhanced the handling
of non-constant indices. They were already handled before
to a smaller extent. (There may be other opportunities
here.)
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-77357-0.diff
Type: text/x-patch
Size: 39890 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180708/b66b9c1e/attachment.bin>
More information about the Gcc-patches
mailing list