[PATCH] Make strlen range computations more conservative

Thu Jul 26 08:55:00 GMT 2018

On Wed, 25 Jul 2018, Martin Sebor wrote:

> > BUT - for the string_constant and c_strlen functions we are,
> > in all cases we return something interesting, able to look
> > at an initializer which then determines that type.  Hopefully.
> > I think the strlen() folding code when it sets SSA ranges
> > now looks at types ...?
> > 
> > Consider
> > 
> > struct X { int i; char c[4]; int j;};
> > struct Y { char c[16]; };
> > 
> > void foo (struct X *p, struct Y *q)
> > {
> >   memcpy (p, q, sizeof (struct Y));
> >   if (strlen ((char *)(struct Y *)p + 4) < 7)
> >     abort ();
> > }
> > 
> > here the GIMPLE IL looks like
> > 
> >   const char * _1;
> > 
> >   <bb 2> [local count: 1073741825]:
> >   _5 = MEM[(char * {ref-all})q_4(D)];
> >   MEM[(char * {ref-all})p_6(D)] = _5;
> >   _1 = p_6(D) + 4;
> >   _2 = __builtin_strlen (_1);
> > 
> > and I guess Martin would argue that since p is of type struct X
> > + 4 gets you to c[4] and thus strlen of that cannot be larger
> > than 3.  But of course the middle-end doesn't work like that
> > and luckily we do not try to draw such conclusions or we
> > are somehow lucky that for the testcase as written above we do not
> > (I'm not sure whether Martins changes in this area would derive
> > such conclusions in principle).
> 
> Only if the strlen argument were p->c.
> 
> > NOTE - we do not know the dynamic type here since we do not know
> > the dynamic type of the memory pointed-to by q!  We can only
> > derive that at q+4 there must be some object that we can
> > validly call strlen on (where Martin again thinks strlen
> > imposes constrains that memchr does not - sth I do not agree
> > with from a QOI perspective)
> 
> The dynamic type is a murky area.

It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.

> As you said, above we don't
> know whether *p is an allocated object or not.  Strictly speaking,
> we would need to treat it as such.  It would basically mean
> throwing out all type information and treating objects simply
> as blobs of bytes.  But that's not what GCC or other compilers do
> either.

It is what GCC does unless it sees a store to the memory.  Basically
pointers carry no type information, only (visible!) stores
(and loads to some extent) provide information about dynamic types
of objects (allocated or declared - GCC doesn't make a difference there).

  For instance, in the modified foo below, GCC eliminates
> the test because it assumes that *p and *q don't overlap.  It
> does that because they are members of structs of unrelated types
> access to which cannot alias.  I.e., not just the type of
> the access matters (here int and char) but so does the type of
> the enclosing object.  If it were otherwise and only the type
> of the access mattered then eliminating the test below wouldn't
> be valid (objects can have their stored value accessed by either
> an lvalue of a compatible type or char).
> 
>   void foo (struct X *p, struct Y *q)
>   {
>     int j = p->j;
>     q->c[__builtin_offsetof (struct X, j)] = 0;
>     if (j != p->j)
>       __builtin_abort ();
> }

Here GCC sees both a load and a store where it derives the
information from.  And yes, it looks at the full access
structure which contains a dereference of p and of q.
Because of that and the fact that the store to q->c[]
(which for GCC implies a store to *q!) that changes the dynamic
type.

> Clarifying (and adjusting if necessary) this area is among
> the goals of the C object model proposal and the ongoing study
> group.  We have been talking about some of these cases there
> and trying to come up with ways to let code do what it needs
> to do without compromising existing language rules, which was
> the consensus position within WG14 when the study group was
> formed: i.e., to clarify or reaffirm existing rules and, in
> cases of ambiguity or where the standard is unintentionally
> overly permissive), favor tighter rules over looser ones.

There is also the C++ object model and the Ada object model and ...

GCC already has an object model in its middle-end and that is
not going to change.  And obviously it was modeled after the
requirements from the languages the middle-end supports.  The
latest change was made necessary by C++ (placement new and
storage re-use specifically).

Richard.