This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] handle non-constant offsets in -Wstringop-overflow (PR 77608)


On 11/30/2017 01:30 PM, Martin Sebor wrote:
> On 11/22/2017 05:03 PM, Jeff Law wrote:
>> On 11/21/2017 12:07 PM, Martin Sebor wrote:
>>> On 11/21/2017 09:55 AM, Jeff Law wrote:
>>>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>>>>> instances of buffer overflow at compile time by handling non-
>>>>>>> constant offsets into the destination object that are known to
>>>>>>> be in some range.  The solution could be improved by handling
>>>>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>>>>> pointers beyond the beginning of an object) but it's a start.
>>>>>>>
>>>>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>>>>> regressions.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> The top of GDB fails to compile at the moment so the validation
>>>>>>> there was incomplete.
>>>>>>>
>>>>>>> gcc-77608.diff
>>>>>>>
>>>>>>>
>>>>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>>>>> runtime buffer overflow
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/77608
>>>>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/77608
>>>>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>>>>> The recursive call into compute_objsize passing in the ostype avoids
>>>>>> having to think about the whole object vs nearest containing object
>>>>>> issues.  Right?
>>>>>>
>>>>>> What's left to worry about is maximum or minimum remaining bytes
>>>>>> in the
>>>>>> object.  At least that's my understanding of how ostype works here.
>>>>>>
>>>>>> So we get the amount remaining, ignoring the variable offset, from
>>>>>> the
>>>>>> recursive call (SIZE).  The space left after we account for the
>>>>>> variable
>>>>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>>>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>>>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>>>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>>>>> nothing left).
>>>>>
>>>>> Subtracting the upper bound of the offset from the size instead
>>>>> of the lower bound when the caller is asking for the minimum
>>>>> object size would make the result essentially meaningless in
>>>>> common cases where the offset is smaller than size_t, as in:
>>>>>
>>>>>   char a[7];
>>>>>
>>>>>   void f (const char *s, unsigned i)
>>>>>   {
>>>>>     __builtin_strcpy (a + i, s);
>>>>>   }
>>>>>
>>>>> Here, i's range is [0, UINT_MAX].
>>>>>
>>>>> IMO, it's only useful to use the lower bound here, otherwise
>>>>> the result would only rarely be non-zero.
>>>> But when we're asking for the minimum left, aren't we essentially
>>>> asking
>>>> for "how much space am I absolutely sure I can write"?  And if that is
>>>> the question, then the only conservatively correct answer is to
>>>> subtract
>>>> the high bound.
>>>
>>> I suppose you could look at it that way but IME with this work
>>> (now, and also last year when I submitted a patch actually
>>> changing the built-in), using the upper bound is just not that
>>> useful because it's too often way too big.  There's no way to
>>> distinguish an out-of-range upper bound that's the result of
>>> an inadequate attempt to constrain a value from an out-of-range
>>> upper bound that is sufficiently constrained but in a way GCC
>>> doesn't see.
>> Understood.
>>
>> So while it's reasonable to not warn in those cases where we just have
>> crap range information (that's always going to be the case for some code
>> regardless of how good my work or Andrew/Aldy's work is), we have to be
>> very careful and make sure that nobody acts on this information for
>> optimization purposes because what we're returning is not conservatively
>> correct.
>>
>>
>>>
>>> There are no clients of this API that would be affected by
>>> the decision one way or the other (unless the user specifies
>>> a -Wstringop-overflow= argument greater than the default 2)
>>> so I don't think what we do now matters much, if at all.
>> Right, but what's to stop someone without knowledge of the
>> implementation and its quirk of not returning the conservatively safe
>> result from using the results in other ways.
> 
> Presumably they would find out by testing their code.  But this
> is a hypothetical scenario.  I added the function for warnings.
> I wasn't expecting it to be used for optimization, no such uses
> have emerged, and I don't have the impression that anyone is
> contemplating adding them (certainly not in stage 3).  If you
> think the function could be useful for optimization then we
> should certainly consider changing it as we gain experience
> with it under those conditions.
Merely passing tests does not mean the code is correct, we both have the
war stories and scars to prove it. :-)  Hell, I prove it to myself
nearly daily :(

Furthermore, just because nobody is using the function today in an
optimization context does not mean it will always be the case.  Worse
yet, someone could potentially use a caller of compute_objsize without
knowing about the limitations.

In fact, if I look at how we handle expand_builtin_mempcpy we have:

 /* Avoid expanding mempcpy into memcpy when the call is determined
     to overflow the buffer.  This also prevents the same overflow
     from being diagnosed again when expanding memcpy.  */
  if (!check_memop_sizes (exp, dest, src, len))
    return NULL_RTX;

While that's not strictly meant to be an optimization, it is in effect
changing the code we generate based on the return value of
check_memop_sizes, which comes from compute_objsize.  Thankfully, the
worst that happens here is we'll fail to turn the mempcpy into a memcpy
if compute_objsize/check_memop_sizes returns a value that is not
strictly correct.  But I think it highlights how easy it is to end up
having code generation changing based on the results of compute_objsize.


>> What would the impact be of simply not supporting those queries,
>> essentially returning "I don't know" rather than returning something
>> that isn't conservatively correct?
> 
> Except for false negatives with -Wstringop-overflow=3 and =4
> (i.e., Object Size type 2 and 3) I don't think there would be
> any impact.  As I said, the function isn't used for optimization
> and I don't think the option is used in these modes either, so
> in my mind it matters very little.  I don't even think there
> are any tests for it in these modes, either.
SO based on my research around the mempcpy stuff above, my thinking is
refined a bit.

Inherently the result of compute_objsize is inexact when presented with
a non-constant offset (and it may be in other contexts as well).  That
introduces a level of risk in that callers may well end up using the
result of compute_objsize in ways that are unsafe.

THe best way I know of to deal with that is to make the result a
multi-state so that we can distinguish between the exact results and
inexact results.  ie, the fact that the function returns an inexact
result gets baked into its API.  It's a lot harder to mis-use.

I could possibly live with a pair of comments though.  In
compute_objsize we'd want a comment clarifying that the result is
inexact and discourage using the result to make code generation
decisions of any kind (ie, more general than just discouraging using the
return value for optimization purposes).

In the mempcpy code we'd want a comment indicating why it's safe to use
the result the way we do.  Essentially if compute_objsize returns true,
then we transform the result into a memcpy + return value adjustment,
otherwise we just call mempcpy.  In both cases we copy the same data and
the final return value is the same.

> 
> For now, I've added comments to clarify the return value and
> the purpose of the function.  Let me know if that's good enough
> or if you want me to make any other changes.
SO would you rather go with baking the inexact nature of the return
value into the API or the pair of comments noted above?

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]