This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote:
>
> On 07/06/2018 09:52 AM, Richard Biener wrote:
> > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> GCC folds accesses to members of constant aggregates except
> >> for character arrays/strings. For example, the strlen() call
> >> below is not folded:
> >>
> >> const char a[][4] = { "1", "12" };
> >>
> >> int f (void) { retturn strlen (a[1]); }
> >>
> >> The attached change set enhances the string_constant() function
> >> to make it possible to extract string constants from aggregate
> >> initializers (CONSTRUCTORS).
> >>
> >> The initial solution was much simpler but as is often the case,
> >> MEM_REF made it fail to fold things like:
> >>
> >> int f (void) { retturn strlen (a[1] + 1); }
> >>
> >> Handling those made the project a bit more interesting and
> >> the final solution somewhat more involved.
> >>
> >> To handle offsets into aggregate string members the patch also
> >> extends the fold_ctor_reference() function to extract entire
> >> string array initializers even if the offset points past
> >> the beginning of the string and even though the size and
> >> exact type of the reference are not known (there isn't enough
> >> information in a MEM_REF to determine that).
> >>
> >> Tested along with the patch for PR 86415 on x86_64-linux.
> >
> > + if (TREE_CODE (init) == CONSTRUCTOR)
> > + {
> > + tree type;
> > + if (TREE_CODE (arg) == ARRAY_REF
> > + || TREE_CODE (arg) == MEM_REF)
> > + type = TREE_TYPE (arg);
> > + else if (TREE_CODE (arg) == COMPONENT_REF)
> > + {
> > + tree field = TREE_OPERAND (arg, 1);
> > + type = TREE_TYPE (field);
> > + }
> > + else
> > + return NULL_TREE;
> >
> > what's wrong with just
> >
> > type = TREE_TYPE (field);
>
> In response to your comment below abut size I simplified things
> further so determining the type a priori is no longer necessary.
>
> > ?
> >
> > + base_off *= BITS_PER_UNIT;
> >
> > poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> > for poly you'd then use poly_offset?
>
> Okay, I tried to avoid the overflow. (Converting between all
> these flavors of wide int types is a monumental PITA.)
>
> >
> > You extend fold_ctor_reference to treat size == 0 specially but then
> > bother to compute a size here - that looks unneeded?
>
> Yes, well spotted, thanks! I simplified the code so this isn't
> necessary, and neither is the type.
>
> >
> > While the offset of the reference determines the first field in the
> > CONSTRUCTOR, how do you know the access doesn't touch
> > adjacent ones? STRING_CSTs do not have to be '\0' terminated,
> > so consider
> >
> > char x[2][4] = { "abcd", "abcd" };
> >
> > and MEM[&x] with a char[8] type? memcpy "inlining" will create
> > such MEMs for example.
>
> The code is only used to find string constants in initializer
> expressions where I don't think the size of the access comes
> into play. If a memcpy() call results in a MEM_REF[char[8],
> &x, 8] that's fine. It's a valid reference and we can still
> get the underlying character sequence (which is represented
> as two STRING_CSTs with the two string literals). I might
> be missing the point of your question.
Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.
> >
> > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
> > tree byte_offset = DECL_FIELD_OFFSET (cfield);
> > tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
> > tree field_size = DECL_SIZE (cfield);
> > - offset_int bitoffset;
> > - offset_int bitoffset_end, access_end;
> > +
> > + if (!field_size && TREE_CODE (cval) == STRING_CST)
> > + {
> > + /* Determine the size of the flexible array member from
> > + the size of the string initializer provided for it. */
> > + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> > + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> > + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> > + field_size = build_int_cst (size_type_node, len);
> > + }
> >
> > Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> > say, for
> >
> > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>
> I can't think of a use for it. Do you have something in mind?
Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler. So I don't see why
it shouldn't work for general flex-arrays.
> >
> > ? And why not use simply
> >
> > field_size = TYPE_SIZE (TREE_TYPE (cval));
> >
> > like you do in c_strlen?
>
> Yes, that's simpler, thanks.
>
> >
> > Otherwise looks reasonable.
>
> Attached is an updated patch. I also enhanced the handling
> of non-constant indices. They were already handled before
> to a smaller extent. (There may be other opportunities
> here.)
Please don't do functional changes to a patch in review, without
exactly pointing out the change. It makes review inefficent for me.
Looks like it might be the NULL type argument handling?
+
+ if (!field_size && TREE_CODE (cval) == STRING_CST)
+ {
+ /* Determine the size of the flexible array member from
+ the size of the string initializer provided for it. */
+ /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */
+ /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */
+ /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */
+ /* field_size = build_int_cst (size_type_node, len); */
+ field_size = TYPE_SIZE (TREE_TYPE (cval));
why all the commented code?
OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR
handled at that point.
Thanks,
Richard.
>
> Martin