Bug 101419 - [11 Regression] collapsing memset() calls can break __builtin_object_size()
Summary: [11 Regression] collapsing memset() calls can break __builtin_object_size()
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 11.5
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2021-07-11 14:45 UTC by Kees Cook
Modified: 2023-07-07 10:40 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-07-12 00:00:00


Attachments
memset collapsing breaks __builtin_object_size() (288 bytes, text/plain)
2021-07-11 14:45 UTC, Kees Cook
Details
gcc12-pr101419.patch (1.30 KB, patch)
2021-07-12 11:42 UTC, Jakub Jelinek
Details | Diff
gcc12-pr101419.patch (3.13 KB, patch)
2021-07-12 15:59 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kees Cook 2021-07-11 14:45:26 UTC
Created attachment 51131 [details]
memset collapsing breaks __builtin_object_size()

I've found a strange misoptimization around memset() and __builtin_object_size(). If there are two memset() calls that can be collapsed (due to being fully overlapping, I assume), use of __builtin_object_size() may return the wrong result.

The example code shows that __builtin_object_size(&int_value, 1) returns 1 instead of 4:

> $ gcc  -Wall -Wextra -fno-strict-aliasing -fwrapv -O2 -c -o wat.o wat.c 
> In function ‘do_wipe’,
>     inlined from ‘loops’ at wat.c:24:3:
> wat.c:15:3: warning: call to ‘__detected_overflow’ declared with attribute warning: detected overflow [-Wattribute-warning]
>    15 |   __detected_overflow(__builtin_object_size(&info->lg, 1), sizeof(info->lg));
>       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, info->lg is int, but the call to __builtin_object_size() resolves to the size of info->sm (char). This can be seen directly in the resulting output:

> $ objdump -rd wat.o
> ...
> 0000000000000000 <loops>:
>    0:   53                      push   %rbx
>    1:   be 04 00 00 00          mov    $0x4,%esi
>    6:   48 89 fb                mov    %rdi,%rbx
>    9:   c6 07 00                movb   $0x0,(%rdi)
>    c:   bf 01 00 00 00          mov    $0x1,%edi
>   11:   e8 00 00 00 00          call   16 <loops+0x16>
>                         12: R_X86_64_PLT32      __detected_overflow-0x4
>   16:   c7 03 00 00 00 00       movl   $0x0,(%rbx)
>   1c:   5b                      pop    %rbx
>   1d:   c3                      ret    

The first argument to __detected_overflow() is "1", instead of 4.

Any changes to this example code makes the bug disappear (removal of loops, removal of empty asm, or reordering of memset() calls).

Using Compiler Explorer, this bug appears to have been introduced between GCC 8.5 and 9.1: https://godbolt.org/z/oGq5K9fE4
Comment 1 Richard Biener 2021-07-12 07:59:43 UTC
I think it behaves as expected, __builtin_object_size computes a maximum or minimum value, not an exact value.
Comment 2 Jakub Jelinek 2021-07-12 08:33:10 UTC
Started with r9-2635-g78ea9abc2018243af7f7ada6135144ac90c6ad27
I wonder if objsz pass when insert_min_max_p shouldn't in addition to adding MIN_EXPR or MAX_EXPR around the result of __bos also drop the least significant bit from the __bos second argument.
Comment 3 Richard Biener 2021-07-12 09:03:58 UTC
(In reply to Jakub Jelinek from comment #2)
> Started with r9-2635-g78ea9abc2018243af7f7ada6135144ac90c6ad27
> I wonder if objsz pass when insert_min_max_p shouldn't in addition to adding
> MIN_EXPR or MAX_EXPR around the result of __bos also drop the least
> significant bit from the __bos second argument.

But then isn't this wrong as well?  Btw, I never get what the second bit means,
it's documented as

"The second bit determines if maximum or minimum of remaining bytes
is computed."

but does it compute the maximum size when the bit is set or when it is not set?
For the least significant bit the situation is better:

"If the least significant
bit is clear, objects are whole variables, if it is set, a closest
surrounding subobject is considered the object a pointer points to."

Unfortunately the examples also only use 0 and 1.
Comment 4 Jakub Jelinek 2021-07-12 09:58:09 UTC
--- gcc/tree-object-size.c.jj	2021-01-04 10:25:39.911221618 +0100
+++ gcc/tree-object-size.c	2021-07-12 11:28:51.328120222 +0200
@@ -1393,6 +1393,11 @@ pass_object_sizes::execute (function *fu
 			{
 			  tree tem = make_ssa_name (type);
 			  gimple_call_set_lhs (call, tem);
+			  if (object_size_type == 1)
+			    {
+			      ost = build_zero_cst (TREE_TYPE (ost));
+			      gimple_call_set_arg (call, 1, ost);
+			    }
 			  enum tree_code code
 			    = object_size_type == 1 ? MIN_EXPR : MAX_EXPR;
 			  tree cst = build_int_cstu (type, bytes);
works, but unfortunately only for __builtin_object_size (, 1).
When I did that also for 3 (to turn it into 2), it breaks several gcc.dg/builtin-object-size*.c tests.
0 is defined as an upper bound of the object size, subobjects not taken into account
1 is defined similarly, but subobjects are taken into account
2 is defined as a lower bound of the object size, subobjects not taken into account
3 is 2 with subobjects
-D_FORTIFY_SOURCE={1,2} typically only uses 0 and 1 modes, 2 and 3 are rare, and are useful e.g. when somebody wants to find out if the object size is exact or not (for exact case the upper and lower bounds are equal).
Comment 5 Jakub Jelinek 2021-07-12 10:37:02 UTC
Note, for the above patch I was worried about something like:
struct S { int a; char b[36]; int c; };

static inline __SIZE_TYPE__
foo (char *p)
{
  return __builtin_object_size (p, 1);
}

__SIZE_TYPE__
bar (void)
{
  struct S s;
  return foo (&s.b[0]);
}

with -O2 -fno-early-inlining, but in that case we actually don't perform that at all - the MIN_EXPR is added only if __bos can be computed during the objsz1 pass to something other than the "unknown" value.  If it is, then there shouldn't be pointer passed from caller as one of the possible values, so maybe it is safe.
Of course this shows that the objsz1 pass is just mitigation pass, because when the function with __builtin_object_size calls isn't early inlined or some caller of that and ultimately objsz1 pass isn't able to compute a known value, then any SCCVN pointer replacement with different subobject sizes that could feed a subobject __bos (1 or 3) could change the user visible value.

E.g.
struct S { char b[36]; int c; };
void baz (char *);

static inline __SIZE_TYPE__
foo (char *p)
{
  return __builtin_object_size (p, 1);
}

__SIZE_TYPE__
bar (void)
{
  struct S *p = __builtin_malloc (2 * sizeof (struct S)) + sizeof (struct S);
  baz ((char *) p);
  return foo (&p->b[0]);
}
doesn't reproduce it, we still pass
  p_5 = _1 + 40;
  baz (p_5);
  _2 = &MEM[(struct S *)_1 + 40B].b[0];
  _8 = __builtin_object_size (_2, 1);
and don't figure out that _2 == p_5.
But modified #c0 testcase triggers it at -O2:
typedef __SIZE_TYPE__ size_t;
void baz (int, int) __attribute__((__warning__("detected overflow")));

union U {
  int i;
  char c;
};

static void
qux (void *p, size_t q)
{
  if (__builtin_object_size (p, 1) < q)
    baz (__builtin_object_size (p, 1), q);
  __builtin_memset (p, 0, q);
}

static void
foo (union U *u)
{
  qux (&u->c, sizeof (u->c));
  qux (&u->i, sizeof (u->i));
}

void
bar (union U *u)
{
  int i, j;
  for (i = 0; i < 1; i++)
    {
      foo (u);

      for (j = 0; j < 2; j++)
        asm volatile ("");
    }
}
and this patch doesn't fix it.
Comment 6 Richard Biener 2021-07-12 10:50:45 UTC
Nothing can "fix" __builtin_object_size here (on sub-objects) without changing how we represent and CSE addresses, esp. if you consider inlining where we
want to interpret __builtin_object_size (p, ..) as having passed not literal
'p' but the value 'p' has at this point.  Thus any pointer CSE that happens
on the value of 'p' before inlining happens will "break" our expectation on it.

So suppose for a moment we'd have ADDR_WITH_SIZE_EXPR <obj, size-cst> which
we could lower to just ADDR_EXPR after the final object-size pass.  Then
during all early opts we couldn't CSE addresses with different sizes or
simplify equality conditionals on them.  And we'd have to do that everywhere
as for example an LTO link might expose a caller/callee with __builtin_object_size.

Maybe we could somehow lower ADDR_WITH_SIZE_EXPR that do not "escape" (but
we'd need to compute that).

That said, my point is that sth like __builtin_object_size is quite fundamentally broken [for an optimizing compiler].
Comment 7 Jakub Jelinek 2021-07-12 10:59:52 UTC
It is the cunrolli pass where things go wrong (it VNs the &u->c for both __builtin_object_size calls while previously only the first one was &u->c and the second was &u->i).
Now, objsz2 obviously needs to be done after IPA opts (most importantly inlining), but what other opt passes we want in between IPA and objsz2 is a question:
  PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
      NEXT_PASS (pass_remove_cgraph_callee_edges);
      /* Initial scalar cleanups before alias computation.
         They ensure memory accesses are not indirect wherever possible.  */
      NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
      NEXT_PASS (pass_ccp, true /* nonzero_p */);
      NEXT_PASS (pass_post_ipa_warn);
      /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_complete_unrolli);
      NEXT_PASS (pass_backprop);
      NEXT_PASS (pass_phiprop);
      NEXT_PASS (pass_forwprop);
      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
I bet the rewriting no longer addressed locals into SSA form if possible done for ccp could be sometimes relevant, because when an address is stored into a previously address taken local and read back, then __bos could see it if in SSA form and not otherwise.  And in theory the cunrolli could reveal some cases, though not sure.
So, either do the TODO_update_address_taken sooner (e.g. at end of pass_remove_cgraph_callee_edges - does ccp itself bring any address taken removal opportunities?) and then do objsz2, or schedule another objsz pass with insert_minmax right at the start of pass_all_optimizations?
Out of ideas and it is still just mitigation.

Another way would be change SCCVN to handle it conservatively for __bos (x, 1) but that could break the rarely used __bos (x, 3) - but it can break randomly for both now.  E.g. in the
  _8 = &u_5(D)->c;
  use (_8);
...
  _9 = &u_5(D)->i;
  use2 (_9);
case when objsz2 wasn't done yet (check with property) and perhaps if any __bos (x, 1) are seen (cfun flag propagated conservatively during inlining), compute
__bos (ptr, 1) for both and rewrite the setter to the larger one, i.e. not optimize into:
  _8 = &u_5(D)->c;
  use (_8);
...
  use2 (_8);
but
  _8 = &u_5(D)->i;
  use (_8);
...
  use2 (_8);
Comment 8 Jakub Jelinek 2021-07-12 11:21:24 UTC
--- gcc/tree-pass.h.jj	2021-01-27 10:10:00.525903635 +0100
+++ gcc/tree-pass.h	2021-07-12 13:10:59.621933276 +0200
@@ -208,6 +208,7 @@ protected:
 #define PROP_gimple_lcf		(1 << 1)	/* lowered control flow */
 #define PROP_gimple_leh		(1 << 2)	/* lowered eh */
 #define PROP_cfg		(1 << 3)
+#define PROP_objsz		(1 << 4)	/* object sizes computed */
 #define PROP_ssa		(1 << 5)
 #define PROP_no_crit_edges      (1 << 6)
 #define PROP_rtl		(1 << 7)
--- gcc/tree-object-size.c.jj	2021-01-04 10:25:39.911221618 +0100
+++ gcc/tree-object-size.c	2021-07-12 13:19:50.021568629 +0200
@@ -1450,6 +1450,8 @@ pass_object_sizes::execute (function *fu
     }
 
   fini_object_sizes ();
+  if (!insert_min_max_p)
+    fun->curr_properties |= PROP_objsz;
   return 0;
 }
 
--- gcc/tree-ssa-sccvn.c.jj	2021-06-09 10:20:08.988342285 +0200
+++ gcc/tree-ssa-sccvn.c	2021-07-12 13:14:33.482962387 +0200
@@ -925,12 +925,10 @@ copy_reference_ops_from_ref (tree ref, v
 			 + (wi::to_offset (bit_offset) >> LOG2_BITS_PER_UNIT));
 		    /* Probibit value-numbering zero offset components
 		       of addresses the same before the pass folding
-		       __builtin_object_size had a chance to run
-		       (checking cfun->after_inlining does the
-		       trick here).  */
+		       __builtin_object_size had a chance to run.  */
 		    if (TREE_CODE (orig) != ADDR_EXPR
 			|| maybe_ne (off, 0)
-			|| cfun->after_inlining)
+			|| (cfun->curr_properties & PROP_objsz))
 		      off.to_shwi (&temp.off);
 		  }
 	      }

seems to work for both testcases.
Comment 9 Jakub Jelinek 2021-07-12 11:26:12 UTC
The above patch will slightly pessimize optimizations during the
      NEXT_PASS (pass_remove_cgraph_callee_edges);
      /* Initial scalar cleanups before alias computation.
         They ensure memory accesses are not indirect wherever possible.  */
      NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
      NEXT_PASS (pass_ccp, true /* nonzero_p */);
      NEXT_PASS (pass_post_ipa_warn);
      /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_complete_unrolli);
      NEXT_PASS (pass_backprop);
      NEXT_PASS (pass_phiprop);
      NEXT_PASS (pass_forwprop);
passes right after IPA, where we currently only pessimize it before and during IPA.
Comment 10 Jakub Jelinek 2021-07-12 11:42:58 UTC
Created attachment 51139 [details]
gcc12-pr101419.patch

Full untested patch.
Comment 11 Richard Biener 2021-07-12 11:51:17 UTC
I think none of

      NEXT_PASS (pass_complete_unrolli);
      NEXT_PASS (pass_backprop);
      NEXT_PASS (pass_phiprop);
      NEXT_PASS (pass_forwprop);

are useful for objsize so IMHO we should move pass_object_sizes earlier.  CCP
and it's into SSA are probably useful though.  Like the following (untested):

diff --git a/gcc/passes.def b/gcc/passes.def
index 945d2bc797c..fd38e1f2d8e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -194,14 +194,14 @@ along with GCC; see the file COPYING3.  If not see
         They ensure memory accesses are not indirect wherever possible.  */
       NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
-      NEXT_PASS (pass_post_ipa_warn);
       /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
+      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
+      NEXT_PASS (pass_post_ipa_warn);
       NEXT_PASS (pass_complete_unrolli);
       NEXT_PASS (pass_backprop);
       NEXT_PASS (pass_phiprop);
       NEXT_PASS (pass_forwprop);
-      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
       /* pass_build_alias is a dummy pass that ensures that we
         execute TODO_rebuild_alias at this point.  */
       NEXT_PASS (pass_build_alias);
Comment 12 Richard Biener 2021-07-12 11:52:37 UTC
(In reply to Jakub Jelinek from comment #10)
> Created attachment 51139 [details]
> gcc12-pr101419.patch
> 
> Full untested patch.

I guess that's certainly what was intended (and after the patch more explicit).  Whether it's good to not CSE any addresses in unrolled loop bodies is another question (but see my proposed pass order change which would fix this).
Comment 13 Richard Biener 2021-07-12 11:56:41 UTC
(In reply to Richard Biener from comment #12)
> (In reply to Jakub Jelinek from comment #10)
> > Created attachment 51139 [details]
> > gcc12-pr101419.patch
> > 
> > Full untested patch.
> 
> I guess that's certainly what was intended (and after the patch more
> explicit).  Whether it's good to not CSE any addresses in unrolled loop
> bodies is another question (but see my proposed pass order change which
> would fix this).

Note it might "break" GIMPLE FE testcases with startwith after the
objsz pass - break in the sense that later FRE will behave differently.

Note usually we still run all property providers but the way you set
the property outside of pass->properties_provided breaks this.  Thus
maybe split objsz into two separate passes rather than using the
flag so you can use properties_provided.
Comment 14 Jakub Jelinek 2021-07-12 12:04:25 UTC
I agree about most of the passes you are moving, but I have an (albeit artificial) testcase that proves that cunrolli does affect objsz:
__SIZE_TYPE__ a[10];

void
foo (void)
{
  char *p = __builtin_malloc (64);
  for (int i = 0; i < 4; i++)
    {
      a[i] = __builtin_object_size (p, 0);
      p += 6;
    }
}

When objsz is done before cunrolli, a[0] to a[3] will all be 64, while
when it is done after cunrolli, it will be 64, 58, 52, 46.

So, what about a mix of your and my patch, add
--- gcc/passes.def
+++ gcc/passes.def
@@ -194,14 +194,14 @@ along with GCC; see the file COPYING3.  If not see
         They ensure memory accesses are not indirect wherever possible.  */
       NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
-      NEXT_PASS (pass_post_ipa_warn);
       /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
+      NEXT_PASS (pass_post_ipa_warn);
       NEXT_PASS (pass_complete_unrolli);
+      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
       NEXT_PASS (pass_backprop);
       NEXT_PASS (pass_phiprop);
       NEXT_PASS (pass_forwprop);
-      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
       /* pass_build_alias is a dummy pass that ensures that we
         execute TODO_rebuild_alias at this point.  */
       NEXT_PASS (pass_build_alias);
to my patch?
Comment 15 Jakub Jelinek 2021-07-12 12:07:31 UTC
(In reply to Richard Biener from comment #13)
> Note usually we still run all property providers but the way you set
> the property outside of pass->properties_provided breaks this.  Thus
> maybe split objsz into two separate passes rather than using the
> flag so you can use properties_provided.

I wanted to avoid having two separate passes, but if you prefer it, it can be done.  Will be a user visible change in the dumps and for -fdisable-tree-* etc. (though we do such changes all the time).
Comment 16 rguenther@suse.de 2021-07-12 13:20:43 UTC
On Mon, 12 Jul 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101419
> 
> --- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> I agree about most of the passes you are moving, but I have an (albeit
> artificial) testcase that proves that cunrolli does affect objsz:
> __SIZE_TYPE__ a[10];
> 
> void
> foo (void)
> {
>   char *p = __builtin_malloc (64);
>   for (int i = 0; i < 4; i++)
>     {
>       a[i] = __builtin_object_size (p, 0);
>       p += 6;
>     }
> }
> 
> When objsz is done before cunrolli, a[0] to a[3] will all be 64, while
> when it is done after cunrolli, it will be 64, 58, 52, 46.
> 
> So, what about a mix of your and my patch, add
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -194,14 +194,14 @@ along with GCC; see the file COPYING3.  If not see
>          They ensure memory accesses are not indirect wherever possible.  */
>        NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>        NEXT_PASS (pass_ccp, true /* nonzero_p */);
> -      NEXT_PASS (pass_post_ipa_warn);
>        /* After CCP we rewrite no longer addressed locals into SSA
>          form if possible.  */
> +      NEXT_PASS (pass_post_ipa_warn);
>        NEXT_PASS (pass_complete_unrolli);
> +      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
>        NEXT_PASS (pass_backprop);
>        NEXT_PASS (pass_phiprop);
>        NEXT_PASS (pass_forwprop);
> -      NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
>        /* pass_build_alias is a dummy pass that ensures that we
>          execute TODO_rebuild_alias at this point.  */
>        NEXT_PASS (pass_build_alias);
> to my patch?

Well, my point was to avoid pessimizing the VN done from cunrolli ;)
Of course any duplication / threading can improve __bos precision,
but then any transform also risks breaking it.  Your example
is IMHO too artificial as good argument.
Comment 17 rguenther@suse.de 2021-07-12 13:24:18 UTC
On Mon, 12 Jul 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101419
> 
> --- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #13)
> > Note usually we still run all property providers but the way you set
> > the property outside of pass->properties_provided breaks this.  Thus
> > maybe split objsz into two separate passes rather than using the
> > flag so you can use properties_provided.
> 
> I wanted to avoid having two separate passes, but if you prefer it, it can be
> done.  Will be a user visible change in the dumps and for -fdisable-tree-* etc.
> (though we do such changes all the time).

Yes, I think it's needed for GIMPLE FE testcase correctness.

As for dumpfile renaming, yeah - that's unfortunate.  I'm always hoping
somebody bites the bullet and implements

  NEXT_PASS (pass_late_object_size, "objsz2")

aka specifying the dump suffix explicitely.
Comment 18 Jakub Jelinek 2021-07-12 13:32:16 UTC
(In reply to rguenther@suse.de from comment #16)
> Well, my point was to avoid pessimizing the VN done from cunrolli ;)
> Of course any duplication / threading can improve __bos precision,
> but then any transform also risks breaking it.  Your example
> is IMHO too artificial as good argument.

Is that VN so important for cunrolli itself (I mean, what harm will be there if it is only VN simplified during FRE after it)?  Does it affect anything but the number of statements that perhaps is used to determine whether to unroll completely or not?
My testcase was artificial, sure, but I was worrying about short loops (say 2 iterations) doing some strcpy/memcpy etc. where having more precise object size would improve security.

(In reply to rguenther@suse.de from comment #17)
> Yes, I think it's needed for GIMPLE FE testcase correctness.

Ok, will change it then to earlyobjsz and objsz then.
Comment 19 rguenther@suse.de 2021-07-12 14:39:00 UTC
On Mon, 12 Jul 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101419
> 
> --- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #16)
> > Well, my point was to avoid pessimizing the VN done from cunrolli ;)
> > Of course any duplication / threading can improve __bos precision,
> > but then any transform also risks breaking it.  Your example
> > is IMHO too artificial as good argument.
> 
> Is that VN so important for cunrolli itself (I mean, what harm will be there if
> it is only VN simplified during FRE after it)?  Does it affect anything but the
> number of statements that perhaps is used to determine whether to unroll
> completely or not?

Sure, if you have address comparisons inside the loop body that
resolve by unrolling (by making the index into an array part constant),
then this changes the size estimate used for further unrolling outer
loops.

> My testcase was artificial, sure, but I was worrying about short loops (say 2
> iterations) doing some strcpy/memcpy etc. where having more precise object size
> would improve security.

Well, we can always find examples where security is improved and we can
find examples where optimization is pessimized.  We can't unfortunately
both have the cake and also eat it.
Comment 20 Jakub Jelinek 2021-07-12 15:59:05 UTC
Created attachment 51144 [details]
gcc12-pr101419.patch

Updated patch.
Comment 21 CVS Commits 2021-07-13 09:05:03 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:dddb6ffdc5c25264dd75ad82dad8e48a0718d2d9

commit r12-2270-gdddb6ffdc5c25264dd75ad82dad8e48a0718d2d9
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jul 13 11:04:22 2021 +0200

    passes: Fix up subobject __bos [PR101419]
    
    The following testcase is miscompiled, because VN during cunrolli changes
    __bos argument from address of a larger field to address of a smaller field
    and so __builtin_object_size (, 1) then folds into smaller value than the
    actually available size.
    copy_reference_ops_from_ref has a hack for this, but it was using
    cfun->after_inlining as a check whether the hack can be ignored, and
    cunrolli is after_inlining.
    
    This patch uses a property to make it exact (set at the end of objsz
    pass that doesn't do insert_min_max_p) and additionally based on discussions
    in the PR moves the objsz pass earlier after IPA.
    
    2021-07-13  Jakub Jelinek  <jakub@redhat.com>
                Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/101419
            * tree-pass.h (PROP_objsz): Define.
            (make_pass_early_object_sizes): Declare.
            * passes.def (pass_all_early_optimizations): Rename pass_object_sizes
            there to pass_early_object_sizes, drop parameter.
            (pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
            drop parameter, move pass_post_ipa_warn right after that.
            * tree-object-size.c (pass_object_sizes::execute): Rename to...
            (object_sizes_execute): ... this.  Add insert_min_max_p argument.
            (pass_data_object_sizes): Move after object_sizes_execute.
            (pass_object_sizes): Likewise.  In execute method call
            object_sizes_execute, drop set_pass_param method and insert_min_max_p
            non-static data member and its initializer in the ctor.
            (pass_data_early_object_sizes, pass_early_object_sizes,
            make_pass_early_object_sizes): New.
            * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
            (cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.
    
            * gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
            instead of -fdump-tree-objsz1-details in dg-options and adjust names
            of dump file in scan-tree-dump.
            * gcc.dg/pr101419.c: New test.
Comment 22 Jakub Jelinek 2021-07-13 13:05:30 UTC
Fixed on the trunk.  Unsure about backports.
Comment 23 Richard Biener 2022-05-27 09:45:52 UTC
GCC 9 branch is being closed
Comment 24 Jakub Jelinek 2022-06-28 10:45:40 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 25 Richard Biener 2023-07-07 10:40:24 UTC
GCC 10 branch is being closed.