[PATCH] fold strlen() of aggregate members (PR 77357)

Richard Biener richard.guenther@gmail.com
Fri Jul 6 15:52:00 GMT 2018


On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <msebor@gmail.com> wrote:
>
> GCC folds accesses to members of constant aggregates except
> for character arrays/strings.  For example, the strlen() call
> below is not folded:
>
>    const char a[][4] = { "1", "12" };
>
>    int f (void) { retturn strlen (a[1]); }
>
> The attached change set enhances the string_constant() function
> to make it possible to extract string constants from aggregate
> initializers (CONSTRUCTORS).
>
> The initial solution was much simpler but as is often the case,
> MEM_REF made it fail to fold things like:
>
>    int f (void) { retturn strlen (a[1] + 1); }
>
> Handling those made the project a bit more interesting and
> the final solution somewhat more involved.
>
> To handle offsets into aggregate string members the patch also
> extends the fold_ctor_reference() function to extract entire
> string array initializers even if the offset points past
> the beginning of the string and even though the size and
> exact type of the reference are not known (there isn't enough
> information in a MEM_REF to determine that).
>
> Tested along with the patch for PR 86415 on x86_64-linux.

+      if (TREE_CODE (init) == CONSTRUCTOR)
+       {
+         tree type;
+         if (TREE_CODE (arg) == ARRAY_REF
+             || TREE_CODE (arg) == MEM_REF)
+           type = TREE_TYPE (arg);
+         else if (TREE_CODE (arg) == COMPONENT_REF)
+           {
+             tree field = TREE_OPERAND (arg, 1);
+             type = TREE_TYPE (field);
+           }
+         else
+           return NULL_TREE;

what's wrong with just

    type = TREE_TYPE (field);

?

+         base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?

You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?

While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.

@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
       tree byte_offset = DECL_FIELD_OFFSET (cfield);
       tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
       tree field_size = DECL_SIZE (cfield);
-      offset_int bitoffset;
-      offset_int bitoffset_end, access_end;
+
+      if (!field_size && TREE_CODE (cval) == STRING_CST)
+       {
+         /* Determine the size of the flexible array member from
+            the size of the string initializer provided for it.  */
+         unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+         tree eltype = TREE_TYPE (TREE_TYPE (cval));
+         len *= tree_to_uhwi (TYPE_SIZE (eltype));
+         field_size = build_int_cst (size_type_node, len);
+       }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?

Otherwise looks reasonable.

Thanks,
Richard.

> Martin



More information about the Gcc-patches mailing list