This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fold strlen() of aggregate members (PR 77357)


On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote:
>
> On 07/06/2018 09:52 AM, Richard Biener wrote:
> > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> GCC folds accesses to members of constant aggregates except
> >> for character arrays/strings.  For example, the strlen() call
> >> below is not folded:
> >>
> >>    const char a[][4] = { "1", "12" };
> >>
> >>    int f (void) { retturn strlen (a[1]); }
> >>
> >> The attached change set enhances the string_constant() function
> >> to make it possible to extract string constants from aggregate
> >> initializers (CONSTRUCTORS).
> >>
> >> The initial solution was much simpler but as is often the case,
> >> MEM_REF made it fail to fold things like:
> >>
> >>    int f (void) { retturn strlen (a[1] + 1); }
> >>
> >> Handling those made the project a bit more interesting and
> >> the final solution somewhat more involved.
> >>
> >> To handle offsets into aggregate string members the patch also
> >> extends the fold_ctor_reference() function to extract entire
> >> string array initializers even if the offset points past
> >> the beginning of the string and even though the size and
> >> exact type of the reference are not known (there isn't enough
> >> information in a MEM_REF to determine that).
> >>
> >> Tested along with the patch for PR 86415 on x86_64-linux.
> >
> > +      if (TREE_CODE (init) == CONSTRUCTOR)
> > +       {
> > +         tree type;
> > +         if (TREE_CODE (arg) == ARRAY_REF
> > +             || TREE_CODE (arg) == MEM_REF)
> > +           type = TREE_TYPE (arg);
> > +         else if (TREE_CODE (arg) == COMPONENT_REF)
> > +           {
> > +             tree field = TREE_OPERAND (arg, 1);
> > +             type = TREE_TYPE (field);
> > +           }
> > +         else
> > +           return NULL_TREE;
> >
> > what's wrong with just
> >
> >     type = TREE_TYPE (field);
>
> In response to your comment below abut size I simplified things
> further so determining the type a priori is no longer necessary.
>
> > ?
> >
> > +         base_off *= BITS_PER_UNIT;
> >
> > poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> > for poly you'd then use poly_offset?
>
> Okay, I tried to avoid the overflow.  (Converting between all
> these flavors of wide int types is a monumental PITA.)
>
> >
> > You extend fold_ctor_reference to treat size == 0 specially but then
> > bother to compute a size here - that looks unneeded?
>
> Yes, well spotted, thanks!  I simplified the code so this isn't
> necessary, and neither is the type.
>
> >
> > While the offset of the reference determines the first field in the
> > CONSTRUCTOR, how do you know the access doesn't touch
> > adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> > so consider
> >
> >   char x[2][4] = { "abcd", "abcd" };
> >
> > and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> > such MEMs for example.
>
> The code is only used to find string constants in initializer
> expressions where I don't think the size of the access comes
> into play.  If a memcpy() call results in a MEM_REF[char[8],
> &x, 8] that's fine.  It's a valid reference and we can still
> get the underlying character sequence (which is represented
> as two STRING_CSTs with the two string literals).  I might
> be missing the point of your question.

Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.

> >
> > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
> >        tree byte_offset = DECL_FIELD_OFFSET (cfield);
> >        tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
> >        tree field_size = DECL_SIZE (cfield);
> > -      offset_int bitoffset;
> > -      offset_int bitoffset_end, access_end;
> > +
> > +      if (!field_size && TREE_CODE (cval) == STRING_CST)
> > +       {
> > +         /* Determine the size of the flexible array member from
> > +            the size of the string initializer provided for it.  */
> > +         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> > +         tree eltype = TREE_TYPE (TREE_TYPE (cval));
> > +         len *= tree_to_uhwi (TYPE_SIZE (eltype));
> > +         field_size = build_int_cst (size_type_node, len);
> > +       }
> >
> > Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> > say, for
> >
> > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>
> I can't think of a use for it.  Do you have something in mind?

Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.

> >
> > ?  And why not use simply
> >
> >   field_size = TYPE_SIZE (TREE_TYPE (cval));
> >
> > like you do in c_strlen?
>
> Yes, that's simpler, thanks.
>
> >
> > Otherwise looks reasonable.
>
> Attached is an updated patch.  I also enhanced the handling
> of non-constant indices.  They were already handled before
> to a smaller extent.  (There may be other opportunities
> here.)

Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?

+
+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */
+         /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */
+         /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */
+         /* field_size = build_int_cst (size_type_node, len); */
+         field_size = TYPE_SIZE (TREE_TYPE (cval));

why all the commented code?

OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR
handled at that point.

Thanks,
Richard.

>
> Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]