This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative

On Tue, 24 Jul 2018, Bernd Edlinger wrote:

> On 07/24/18 23:46, Jeff Law wrote:
> > On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
> >> Hi!
> >>
> >> This patch makes strlen range computations more conservative.
> >>
> >> Firstly if there is a visible type cast from type A to B before passing
> >> then value to strlen, don't expect the type layout of B to restrict the
> >> possible return value range of strlen.
> > Why do you think this is the right thing to do?  ie, is there language
> > in the standards that makes you think the code as it stands today is
> > incorrect from a conformance standpoint?  Is there a significant body of
> > code that is affected in an adverse way by the current code?  If so,
> > what code?
> > 
> > 
> I think if you have an object, of an effective type A say char[100], then
> you can cast the address of A to B, say typedef char (*B)[2] for instance
> and then to const char *, say for use in strlen.  I may be wrong, but I think
> that we should at least try not to pick up char[2] from B, but instead
> use A for strlen ranges, or leave this range open.  Currently the range
> info for strlen is [0..1] in this case, even if we see the type cast
> in the generic tree.

You raise a valid point - namely that the middle-end allows
any object (including storage with a declared type) to change
its dynamic type (even of a piece of it).  So unless you can
prove that the dynamic type of the thing you are looking at
matches your idea of that type you may not derive any string
lengths (or ranges) from it.

BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?


struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
    abort ();

here the GIMPLE IL looks like

  const char * _1;

  <bb 2> [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do not
(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).

NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


> One other example I have found in one of the test cases:
> char c;
> if (strlen(&c) != 0) abort();
> this is now completely elided, but why?  Is there a code base where
> that is used?  I doubt, but why do we care to eliminate something
> stupid like that?  If we would emit a warning for that I'm fine with it,
> But if we silently remove code like that I don't think that it
> will improve anything.  So I ask, where is the code base which
> gets an improvement from that optimization?
> > 
> >>
> >> Furthermore use the outermost enclosing array instead of the
> >> innermost one, because too aggressive optimization will likely
> >> convert harmless errors into security-relevant errors, because
> >> as the existing test cases demonstrate, this optimization is actively
> >> attacking string length checks in user code, while and not giving
> >> any warnings.
> > Same questions here.
> > 
> > I'll also note that Martin is *very* aware of the desire to avoid
> > introducing security relevent errors.  In fact his main focus is to help
> > identify coding errors that have a security impact.  So please don't
> > characterize his work as "actively attacking string length checks in
> > user code".
> > 
> I do fully respect Martin's valuable contributions over the years,
> and I did not intend to say anything about the quality of his work,
> for GCC, it is just breathtaking!
> What I meant is just, what this particular optimization can do.
> > Ultimately we want highly accurate string lengths to help improve the
> > quality of the warnings we generate for potentially dangerous code.
> > These changes seem to take us in the opposite direction.
> > 
> No, I don't think so, we have full control on the direction, when
> I do what Richi requested on his response, we will have one function
> where the string length estimation is based upon, instead of several
> open coded tree walks.
> > So ISTM that you really need a stronger justification using the
> > standards compliance and/or real world code that is made less safe by
> > keeping string lengths as accurate as possible.
> > 
> > 
> This work concentrates mostly on avoiding to interfere with code that
> actually deserves warnings, but which is not being warned about.
> >>
> >>
> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> >> Is it OK for trunk?
> > I'd like to ask we hold on this until I return from PTO (Aug 1) so that
> > we can discuss the best thing to do here for each class of change.
> > 
> Okay.
> > I think you, Martin, Richi and myself should hash through the technical
> > issues raised by the patch.  Obviously others can chime in, but I think
> > the 4 of us probably need to drive the discussion.
> > 
> Yes, sure.  I will try to help when I can.
> Currently I thought Martin is working on the string constant folding,
> (therefore I thought this range patch would not collide with his patch)
> and there are plenty of change requests, plus I think he has some more
> patches on hold.  I would like to see the review comments resolved,
> and maybe also get to see the follow up patches, maybe as a patch
> series, so we can get a clearer picture?
> Thanks
> Bernd.
> > Thanks,
> > Jeff
> > 

Richard Biener <>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]