This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

On 07/11/2018 07:50 AM, Andre Vieira (lists) wrote:
On 11/07/18 11:00, Andre Vieira (lists) wrote:
On 09/07/18 22:44, Martin Sebor wrote:
On 07/09/2018 06:40 AM, Richard Biener wrote:
On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <> wrote:

On 07/06/2018 09:52 AM, Richard Biener wrote:
On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <> wrote:

GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.

+      if (TREE_CODE (init) == CONSTRUCTOR)
+       {
+         tree type;
+         if (TREE_CODE (arg) == ARRAY_REF
+             || TREE_CODE (arg) == MEM_REF)
+           type = TREE_TYPE (arg);
+         else if (TREE_CODE (arg) == COMPONENT_REF)
+           {
+             tree field = TREE_OPERAND (arg, 1);
+             type = TREE_TYPE (field);
+           }
+         else
+           return NULL_TREE;

what's wrong with just

    type = TREE_TYPE (field);

In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


+         base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use
for poly you'd then use poly_offset?

Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)

You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?

Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.

While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.

The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.

Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.

@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
       tree byte_offset = DECL_FIELD_OFFSET (cfield);
       tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
       tree field_size = DECL_SIZE (cfield);
-      offset_int bitoffset;
-      offset_int bitoffset_end, access_end;
+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+         tree eltype = TREE_TYPE (TREE_TYPE (cval));
+         len *= tree_to_uhwi (TYPE_SIZE (eltype));
+         field_size = build_int_cst (size_type_node, len);
+       }

Why does this only apply to STRING_CST initializers and not
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

I can't think of a use for it.  Do you have something in mind?

Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.

?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?

Yes, that's simpler, thanks.

Otherwise looks reasonable.

Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities

Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?

Sorry.  The change I was referring to is the addition and handling
of the varidx variable to string_constant.  It was necessary for
parity with the existing optimization for non-constant array
indices.  It makes it possible to fold strlen calls like:

  const char a[][3] = { "", "1", "12" };

  void f (int i)
    if (__builtin_strlen (&a[2][i]) > 2)
      __builtin_abort ();

Prior to the patch GCC could that for one-dimensional arrays
so to my mind it would have been an oversight for a patch to
handle multi-dimensional arrays not to do the same thing for

+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */
+         /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */
+         /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */
+         /* field_size = build_int_cst (size_type_node, len); */
+         field_size = TYPE_SIZE (TREE_TYPE (cval));

why all the commented code?

OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR
handled at that point.

Done in r262522.

Hi Martin,

I believe this caused the following regresion on aarch64 and arm:
FAIL: gcc.c-torture/execute/builtins/strlen-3.c execution, -Og -g

I haven't quite looked into why though, Ill have a closer look.



After having a look it seems the strlen call:
 'strlen (larger + (x++ & 7))'
is not optmized away for with -Og, which means it will call the strlen
provided and because you removed the "inside_main" setting around the
strlen will abort as __OPTIMIZE__ is set with -Og.  Maybe its worth
skipping this test for -Og?

Yes.  It wasn't necessary to tighten up the test (in fact it
was a mistake).  I reverted the change yesterday in response
to reports like yours (pr86461).  I need to find a different
way to exercise the part of the change.

Sorry about that!


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]