This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On 07/31/2018 09:48 AM, Jakub Jelinek wrote:
On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote:
On 07/31/2018 12:38 AM, Jakub Jelinek wrote:
On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:
Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
it calls abort.  This is the default on popular distributions,

Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard
requires, imposes extra requirements.  So from what this mode accepts or
rejects we shouldn't determine what is or isn't considered valid.

I'm not sure what the additional requirements are but the ones
I am referring to are the enforcing of struct member boundaries.
This is in line with the standard requirements of not accessing
[sub]objects via pointers derived from other [sub]objects.

In the middle-end the distinction between what was originally a reference
to subobjects and what was a reference to objects is quickly lost
(whether through SCCVN or other optimizations).
We've run into this many times with the __builtin_object_size already.
So, if e.g.
struct S { char a[3]; char b[5]; } s = { "abc", "defg" };
...
strlen ((char *) &s) is well defined but
strlen (s.a) is not in C, for the middle-end you might not figure out which
one is which.

Yes, I'm aware of the middle-end transformation to MEM_REF
-- it's one of the reasons why detecting invalid accesses
by the middle end warnings, including -Warray-bounds,
-Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict,
is less than perfect.

But is strlen(s.a) also meant to be well-defined in the middle
end (with the semantics of computing the length or "abcdefg"?)
And if so, what makes it well defined?

Certainly not every "strlen" has these semantics.  For example,
this open-coded one doesn't:

  int len = 0;
  for (int i = 0; s.a[i]; ++i)
    ++len;

It computes 2 (with no warning for the out-of-bounds access).

So if the standard doesn't guarantee it and different kinds
of accesses behave differently, how do we explain what "works"
and what doesn't without relying on GCC implementation details?

If we can't then the only language we have in common with users
is the standard.  (This, by the way, is what the C memory model
group is trying to address -- the language or feature that's
missing from the standard that says when, if ever, these things
might be valid.)

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]