This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On 08/04/18 23:56, Martin Sebor wrote:
> On 08/03/2018 01:00 AM, Jeff Law wrote:
>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>>> On 07/24/18 23:46, Jeff Law wrote:
>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>>> Hi!
>>>>>
>>>>> This patch makes strlen range computations more conservative.
>>>>>
>>>>> Firstly if there is a visible type cast from type A to B before passing
>>>>> then value to strlen, don't expect the type layout of B to restrict the
>>>>> possible return value range of strlen.
>>>> Why do you think this is the right thing to do?  ie, is there language
>>>> in the standards that makes you think the code as it stands today is
>>>> incorrect from a conformance standpoint?  Is there a significant body of
>>>> code that is affected in an adverse way by the current code?  If so,
>>>> what code?
>>>>
>>>>
>>>
>>> I think if you have an object, of an effective type A say char[100], then
>>> you can cast the address of A to B, say typedef char (*B)[2] for instance
>>> and then to const char *, say for use in strlen.  I may be wrong, but I think
>>> that we should at least try not to pick up char[2] from B, but instead
>>> use A for strlen ranges, or leave this range open.  Currently the range
>>> info for strlen is [0..1] in this case, even if we see the type cast
>>> in the generic tree.
>> ISTM that you're essentially saying that the cast to const char *
>> destroys any type information we can exploit here.  But if that's the
>> case, then I don't think we can even derive a range of [0,99].  What's
>> to say that "A" didn't result from a similar cast of some object that
>> was char[200] that happened out of the scope of what we could see during
>> the strlen range computation?
>>
>> If that is what you're arguing, then I think there's a re-evaluation
>> that needs to happen WRT strlen range computation/
>>
>> And just to be clear, I do see this as a significant correctness question.
>>
>> Martin, thoughts?
> 
> The argument is that given:
> 
>    struct S { char a[4], b; };
> 
>    char a[8] = "1234567";
> 
> this is valid and should pass:
> 
>    __attribute__ ((noipa))
>    void f (struct S *p)
>    {
>      assert (7 == strlen (p->a));
>    }
> 
>    int main (void)
>    {
>      f ((struct S*)a);
>    }
> 
> (This is the basic premise behind pr86259.)
> 
> This argument is wrong and the code is invalid.  For the access
> to p->a to be valid p must point to an object of struct S (it
> doesn't) and the p->a array must hold a nul-terminated string
> (it also doesn't).
> 
> This should not be surprising because the following equivalent
> code behaves the same way:
> 
>    __attribute__ ((noipa))
>    void f (struct S *p)
>    {
>      int n = 0;
>      for (int i = 0; p->a[i]; ++i)
>        ++n;
>      if (3 != n)
>        __builtin_abort ();
>    }
> 
> and also because for write accesses, GCC (helpfully) enforces
> the restriction with _FORTIFY_SOURCE=2:
> 
>    __attribute__ ((noipa))
>    void f (struct S *p)
>    {
>      strcpy (p->a, "1234");   // aborts
>    }
> 
> I care less about the optimization than I do about the basic
> premise that it's essential to respect subobject boundaries(*).
> It would make little sense to undo the strlen optimization
> without also undoing the optimization for the direct array
> access case.  Undoing either would raise the question about
> the validity of the _FORRTIFY_SOURCE=2 behavior.  That would
> be a huge step backwards in terms of code security.  If we
> did some of these but not others, it would make the behavior
> inconsistent and surprising, all to accommodate one instance
> of invalid code.
> 
> If we had a valid test case where the strlen optimization
> leads to invalid code, or even if there were a significant
> number of bug reports showing that it breaks an invalid
> but common idiom, I would certainly feel compelled to
> make it right somehow.  But there has been just one bug
> report with clearly invalid code that should be easily
> corrected.
> 


I see this from a software engineering POV.

If we have code like this:

void test (const char *x)
{
   assert (strlen (x) < 10);
}

One would usually expect the program to abort (or at least abort with
a near 100% likelihood) if x is not a valid string with length less
than 10.

But if lto and other optimizations show that this code is invoked with
an invalid, non-zero terminated string the assertion is suddenly gone.

Martin, why do you insist that GCC has to do this, and that it must
be a good idea to do so, just based on the language definition?

Why do we need assertions at all, when all programs have to be completely
correct before we may compile them?


> Martin
> 
> [*] I also care deeply about all the warnings that depend
> on it to avoid false positives: that's pretty much all those
> I have implemented in the middle-end: -Wformat-{overflow,
> truncation}, -Wstringop-{overflow,truncation}, and likely
> even -Wrestrict.

I do as well, but a false positive will not cause real damage.
As I said before with optimistic range infos for warnings, we can
reach both goals at the same time.


Bernd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]