This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On 08/05/2018 12:08 AM, Bernd Edlinger wrote:
> On 08/04/18 23:56, Martin Sebor wrote:
>> On 08/03/2018 01:00 AM, Jeff Law wrote:
>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>>>> On 07/24/18 23:46, Jeff Law wrote:
>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>>>> Hi!
>>>>>>
>>>>>> This patch makes strlen range computations more conservative.
>>>>>>
>>>>>> Firstly if there is a visible type cast from type A to B before passing
>>>>>> then value to strlen, don't expect the type layout of B to restrict the
>>>>>> possible return value range of strlen.
>>>>> Why do you think this is the right thing to do?  ie, is there language
>>>>> in the standards that makes you think the code as it stands today is
>>>>> incorrect from a conformance standpoint?  Is there a significant body of
>>>>> code that is affected in an adverse way by the current code?  If so,
>>>>> what code?
>>>>>
>>>>>
>>>>
>>>> I think if you have an object, of an effective type A say char[100], then
>>>> you can cast the address of A to B, say typedef char (*B)[2] for instance
>>>> and then to const char *, say for use in strlen.  I may be wrong, but I think
>>>> that we should at least try not to pick up char[2] from B, but instead
>>>> use A for strlen ranges, or leave this range open.  Currently the range
>>>> info for strlen is [0..1] in this case, even if we see the type cast
>>>> in the generic tree.
>>> ISTM that you're essentially saying that the cast to const char *
>>> destroys any type information we can exploit here.  But if that's the
>>> case, then I don't think we can even derive a range of [0,99].  What's
>>> to say that "A" didn't result from a similar cast of some object that
>>> was char[200] that happened out of the scope of what we could see during
>>> the strlen range computation?
>>>
>>> If that is what you're arguing, then I think there's a re-evaluation
>>> that needs to happen WRT strlen range computation/
>>>
>>> And just to be clear, I do see this as a significant correctness question.
>>>
>>> Martin, thoughts?
>>
>> The argument is that given:
>>
>>    struct S { char a[4], b; };
>>
>>    char a[8] = "1234567";
>>
>> this is valid and should pass:
>>
>>    __attribute__ ((noipa))
>>    void f (struct S *p)
>>    {
>>      assert (7 == strlen (p->a));
>>    }
>>
>>    int main (void)
>>    {
>>      f ((struct S*)a);
>>    }
>>
>> (This is the basic premise behind pr86259.)
>>
>> This argument is wrong and the code is invalid.  For the access
>> to p->a to be valid p must point to an object of struct S (it
>> doesn't) and the p->a array must hold a nul-terminated string
>> (it also doesn't).
>>
>> This should not be surprising because the following equivalent
>> code behaves the same way:
>>
>>    __attribute__ ((noipa))
>>    void f (struct S *p)
>>    {
>>      int n = 0;
>>      for (int i = 0; p->a[i]; ++i)
>>        ++n;
>>      if (3 != n)
>>        __builtin_abort ();
>>    }
>>
>> and also because for write accesses, GCC (helpfully) enforces
>> the restriction with _FORTIFY_SOURCE=2:
>>
>>    __attribute__ ((noipa))
>>    void f (struct S *p)
>>    {
>>      strcpy (p->a, "1234");   // aborts
>>    }
>>
>> I care less about the optimization than I do about the basic
>> premise that it's essential to respect subobject boundaries(*).
>> It would make little sense to undo the strlen optimization
>> without also undoing the optimization for the direct array
>> access case.  Undoing either would raise the question about
>> the validity of the _FORRTIFY_SOURCE=2 behavior.  That would
>> be a huge step backwards in terms of code security.  If we
>> did some of these but not others, it would make the behavior
>> inconsistent and surprising, all to accommodate one instance
>> of invalid code.
>>
>> If we had a valid test case where the strlen optimization
>> leads to invalid code, or even if there were a significant
>> number of bug reports showing that it breaks an invalid
>> but common idiom, I would certainly feel compelled to
>> make it right somehow.  But there has been just one bug
>> report with clearly invalid code that should be easily
>> corrected.
>>
> 
> 
> I see this from a software engineering POV.
> 
> If we have code like this:
> 
> void test (const char *x)
> {
>    assert (strlen (x) < 10);
> }
> 
> One would usually expect the program to abort (or at least abort with
> a near 100% likelihood) if x is not a valid string with length less
> than 10.
I would not expect the program to abort if this function were called
with an invalid (ie unterminated) string.  In that scenario I know
enough to not expect anything because my program is broken, plain and
simple.

> 
> But if lto and other optimizations show that this code is invoked with
> an invalid, non-zero terminated string the assertion is suddenly gone.
And the program is invalid as it exhibits undefined behavior.  One
undefined behavior is exhibited I have no expectations.  I *like* when
we do something like trap as soon as undefined behavior is discovered,
but I do not expect it.

> 
> Martin, why do you insist that GCC has to do this, and that it must
> be a good idea to do so, just based on the language definition?
We do this all the time in all kinds of situations.

The language definition provides the contract that the programmer and
compiler must adhere to.  When the contract is broken we can not be
responsible for the resulting code as the input source is simply broken.

> 
> Why do we need assertions at all, when all programs have to be completely
> correct before we may compile them?
That's what compilers do.  When they see code that is pointless it gets
removed.  As engineers we sometimes say "while this is undefined
behavior we're not going to exploit the undefined-ness".  That is based
on our experience as developers.  And sometimes we may not even agree
where the line between optimize this vs do not because it's going to
surprise a developer.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]