[PATCH] Convert strlen pass from evrp to ranger.

Aldy Hernandez aldyh@redhat.com
Sat Oct 9 15:04:14 GMT 2021


On Fri, Oct 8, 2021 at 6:52 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 10/8/21 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> > The following patch converts the strlen pass from evrp to ranger,
> > leaving DOM as the last remaining user.
>
> Thanks for doing this.  I know I said I'd work on it but I'm still
> bogged down in my stage 1 work that's not going so great :(  I just
> have a few minor comments/questions on the strlen change (inline)
> but I am a bit concerned about the test failure.
>
> >
> > No additional cleanups have been done.  For example, the strlen pass
> > still has uses of VR_ANTI_RANGE, and the sprintf still passes around
> > pairs of integers instead of using a proper range.  Fixing this
> > could further improve these passes.
> >
> > As a further enhancement, if the relevant maintainers deem useful,
> > the domwalk could be removed from strlen.  That is, unless the pass
> > needs it for something else.
> >
> > With ranger we are now able to remove the range calculation from
> > before_dom_children entirely.  Just working with the ranger on-demand
> > catches all the strlen and sprintf testcases with the exception of
> > builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
> > code.  I have XFAILed the test and documented what the problem is.
>
> builtin-sprintf-warn-22.c is a regression test for a false positive
> in Glibc.  If it fails we'll have to deal with the Glibc failure
> again, which I would rather avoid.  Have you checked to see if
> Glibc is affected by the change?
>
> >
> > It looks like the same problem in the sprintf test triggers a false
> > positive in gimple-ssa-warn-access.cc so I have added
> > -Wno-format-overflow until it can be fixed.
> >
> > I can expand on the false positive if necessary, but the gist is that
> > this:
> >
> >      _17 = strlen (_132);
> >      _18 = strlen (_136);
> >      _19 = _18 + _17;
> >      if (_19 > 75)
> >        goto <bb 59>; [0.00%]
> >      else
> >        goto <bb 61>; [100.00%]
> >
> > ...dominates the sprintf in BB61.  This means that ranger can figure
> > out that the _17 and _18 are [0, 75].  On the other hand, evrp
> > returned a range of [0, 9223372036854775805] which presumably the
> > sprintf code was ignoring as a false positive here:
>
> This is a feature designed to avoid false positives when the sprintf
> pass doesn't know anything about the strings (i.e., their lengths
> are unconstrained by either the sizes of the arrays they're stored
> in or any expressions like asserts involving their lengths).
>
> It sounds like the strlen/ranger improvement partially propagates
> constraints from subsequent expressions into the strlen results
> but it doesn't go far enough for them to actually fully satisfy
> the constraint, which is what in turn triggers the warning.
>
> I.e., in the test:
>
> void g (char *s1, char *s2)
> {
>    char b[1025];
>    size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
>    if (n + d + 1 >= 1025)
>      return;
>
>    sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }
>
> the range of n and d is [0, INF] and so the sprintf call doesn't
> trigger a warning.  With your change, because their range is
> [0, 1023] each (and there's no way to express that their sum
> is less than 1025), the warning triggers because it considers
> the worst case scenario (the upper bounds of both).

I agree with Andrew.  If this doesn't work with more than 1 string we
shouldn't even attempt it, as it's bound to be riddled with false
positives, which you'll get more of with enhanced ranges at this
point.

> > @@ -269,7 +270,7 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off,
> >       return -1;
> >
> >     value_range vr;
> > -  if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
> > +  if (!rvals->range_of_expr (vr, si->nonzero_chars, stmt))
>
> We need stmt rather than si->stmt because the latter may be different
> or null when the former will always refer to the current statement,
> correct?

Yes.

> I think there still are quite a few calls to get_addr_stridx() and
> get_addr() that don't pass in rvals.  I started doing this back in
> the GCC 11 cycle but didn't finish the job.  Eventually, rvals
> should be passed to all their callers (or they made class members
> with a ranger member).  I mention this in case it has an effect on
> the range propagation (since the pass also sets ranges).

Definitely.  You'll get even better ranges, though perhaps more false
positives as discussed above :-/.

Aldy



More information about the Gcc-patches mailing list