This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On 08/20/18 12:12, Richard Biener wrote:
> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 08/10/2018 10:56 AM, Martin Sebor wrote:
>>> On 08/08/2018 11:36 PM, Jeff Law wrote:
>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote:
>>>>
>>>>> The warning bits are definitely not okay by me.  The purpose
>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is
>>>>> to detect buffer overflows.  When a warning doesn't have access
>>>>> to string length information for dynamically created strings
>>>>> (like the strlen pass does) it uses array sizes as a proxy.
>>>>> This is useful both to detect possible buffer overflows and
>>>>> to prevent false positives for overflows that cannot happen
>>>>> in correctly written programs.
>>>> So how much of this falling-back to array sizes as a proxy would become
>>>> unnecessary if sprintf had access to the strlen pass as an analysis
>>>> module?
>>>>
>>>> As you know we've been kicking that around and from my investigations
>>>> that doesn't really look hard to do.  Encapsulate the data structures in
>>>> a class, break up the statement handling into analysis and optimization
>>>> and we should be good to go.  I'm hoping to start prototyping this week.
>>>>
>>>> If we think that has a reasonable chance to eliminate the array-size
>>>> fallback, then that seems like the most promising path forward.
>>>
>>> We discussed this idea this morning so let me respond here and
>>> reiterate the answer.  Using the strlen data will help detect
>>> buffer overflow where the array size isn't available but it
>>> cannot replace the array size heuristic. Here's a simple
>>> example:
>>>
>>>    struct S { char a[8]; };
>>>
>>>    char d[8];
>>>    void f (struct S *s, int i)
>>>    {
>>>      sprintf (d, "%s-%i",  s[i].a, i);
>>>    }
>>>
>>> We don't know the length of s->a but we do know that it can
>>> be up to 7 bytes long (assuming it's nul-terminated of course)
>>> so we know the sprintf call can overflow.  Conversely, if
>>> the size of the destination is increased to 20  the sprintf
>>> call cannot overflow so the diagnostic can be avoided.
>>>
>>> Removing the array size heuristic would force us to either give
>>> up on diagnosing the first case or issue false positives for
>>> the second case.  I think the second alternative would make
>>> the warning too noisy to be useful.
>>>
>>> The strlen pass will help detect buffer overflows in cases
>>> where the array size isn't known (e.g., with dynamically
>>> allocated buffers) but where the length of the string store
>>> in the array is known.  It will also help avoid false positives
>>> in cases where the string stored in an array of known size is
>>> known to be too short to cause an overflow.  For instance here:
>>>
>>>    struct S { char a[8]; };
>>>
>>>    char d[8];
>>>    void f (struct S *s, int i)
>>>    {
>>>      if (strlen (s->a) < 4 && i >= 0 && i < 100)
>>>        sprintf (d, "%s-%i",  s->a, i);
>>>    }
>> ACK.  Thanks for explaining things here too.  I can't speak for others,
>> but seeing examples along with the explanation is easier for me to absorb.
>>
>> For Bernd and others -- after kicking things around a bit with Martin,
>> what we're recommending is that compute_string_length we compute the
>> bounds using both GIMPLE and C semantics and return both.
> 
> But you can't do this because GIMPLE did transforms that are not valid in
> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret
> it as GIMPLE.  What you'd do is return GIMPLE semantics length
> and "foobar" semantics length which doesn't match the original source.
> 

If I understood that suggestion right, it means, we
live with some false positive or missing warnings due to those transformations.
That means, get_range_strlen with the 2-parameter overload is used
for warnings only.  And it returns most of the time a correct range info,
that is good enough for warnings.

The 4-parameter overload when called with strict=true, returns only range
info that are based on hard facts, this range info does not use those unsafe
type infos, but it can be safely used as input to the VRP machinery.


Bernd.

>>
>> Anything which influences code generation or optimization must use the
>> GIMPLE semantics.  Warnings may use the C semantics in an effort to
>> improve preciseness.
>>
>> Martin has some other stuff to flush out of his queue, then he'll be
>> focused on the changes to compute_string_length.
>> Jeff

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]