This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fold strlen() of aggregate members (PR 77357)


On 07/09/2018 06:40 AM, Richard Biener wrote:
On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote:

On 07/06/2018 09:52 AM, Richard Biener wrote:
On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:

GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.

+      if (TREE_CODE (init) == CONSTRUCTOR)
+       {
+         tree type;
+         if (TREE_CODE (arg) == ARRAY_REF
+             || TREE_CODE (arg) == MEM_REF)
+           type = TREE_TYPE (arg);
+         else if (TREE_CODE (arg) == COMPONENT_REF)
+           {
+             tree field = TREE_OPERAND (arg, 1);
+             type = TREE_TYPE (field);
+           }
+         else
+           return NULL_TREE;

what's wrong with just

    type = TREE_TYPE (field);

In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.

?

+         base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?

Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)


You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?

Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.


While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.

The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.

Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.


@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
       tree byte_offset = DECL_FIELD_OFFSET (cfield);
       tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
       tree field_size = DECL_SIZE (cfield);
-      offset_int bitoffset;
-      offset_int bitoffset_end, access_end;
+
+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+         tree eltype = TREE_TYPE (TREE_TYPE (cval));
+         len *= tree_to_uhwi (TYPE_SIZE (eltype));
+         field_size = build_int_cst (size_type_node, len);
+       }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

I can't think of a use for it.  Do you have something in mind?

Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.


?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?

Yes, that's simpler, thanks.


Otherwise looks reasonable.

Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)

Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?

Sorry.  The change I was referring to is the addition and handling
of the varidx variable to string_constant.  It was necessary for
parity with the existing optimization for non-constant array
indices.  It makes it possible to fold strlen calls like:

  const char a[][3] = { "", "1", "12" };

  void f (int i)
  {
    if (__builtin_strlen (&a[2][i]) > 2)
      __builtin_abort ();
  }

Prior to the patch GCC could that for one-dimensional arrays
so to my mind it would have been an oversight for a patch to
handle multi-dimensional arrays not to do the same thing for
those.

+
+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */
+         /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */
+         /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */
+         /* field_size = build_int_cst (size_type_node, len); */
+         field_size = TYPE_SIZE (TREE_TYPE (cval));

why all the commented code?

OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR
handled at that point.

Done in r262522.

Thanks.
Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]