This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On August 8, 2018 5:51:16 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>On 08/07/2018 11:46 AM, Richard Biener wrote:
>> On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor
><msebor@gmail.com> wrote:
>>> On 08/07/2018 09:33 AM, Bernd Edlinger wrote:
>>>> On 08/07/18 17:02, Martin Sebor wrote:
>>>>> On 08/06/2018 11:45 PM, Richard Biener wrote:
>>>>>> On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor
>>> <msebor@gmail.com> wrote:
>>>>>>> On 08/06/2018 11:40 AM, Jeff Law wrote:
>>>>>>>> On 08/06/2018 11:15 AM, Martin Sebor wrote:
>>>>>>>>>>> These examples do not aim to be valid C, they just point out
>>>>>>> limitations
>>>>>>>>>>> of the middle-end design, and a good deal of the problems
>are
>>> due
>>>>>>>>>>> to trying to do things that are not safe within the
>boundaries
>>>>>>> given
>>>>>>>>>>> by the middle-end design.
>>>>>>>>>> I really think this is important -- and as such I think we
>need
>>> to
>>>>>>> move
>>>>>>>>>> away from trying to describe scenarios in C because doing so
>>> keeps
>>>>>>>>>> bringing us back to the "C doesn't allow XYZ" kinds of
>>> arguments
>>>>>>> when
>>>>>>>>>> what we're really discussing are GIMPLE semantic issues.
>>>>>>>>>>
>>>>>>>>>> So examples should be GIMPLE.  You might start with (possibly
>>>>>>> invalid) C
>>>>>>>>>> code to generate the GIMPLE, but the actual discussion needs
>to
>>> be
>>>>>>>>>> looking at GIMPLE.  We might include the C code in case
>someone
>>>>>>> wants to
>>>>>>>>>> look at things in a debugger, but bringing the focus to
>GIMPLE
>>> is
>>>>>>> really
>>>>>>>>>> important here.
>>>>>>>>>
>>>>>>>>> I don't understand the goal of this exercise.  Unless the
>GIMPLE
>>>>>>>>> code is the result of a valid test case (in some language GCC
>>>>>>>>> supports), what does it matter what it looks like?  The basis
>of
>>>>>>>>> every single transformation done by a compiler is that the
>>> source
>>>>>>>>> code is correct.  If it isn't then all bets are off.  I'm no
>>> GIMPLE
>>>>>>>>> expert but even I can come up with any number of GIMPLE
>>> expressions
>>>>>>>>> that have undefined behavior.  What would that prove?
>>>>>>>> The GIMPLE IL is less restrictive than the original source
>>> language.
>>>>>>>> The process of translation into GIMPLE and optimization can
>>> create
>>>>>>>> situations in the GIMPLE IL that can't be validly represented
>in
>>> the
>>>>>>>> original source language.  Subobject crossing being one such
>>> case,
>>>>>>> there
>>>>>>>> are certainly others.  We have to handle these scenarios
>>> correctly.
>>>>>>>
>>>>>>> Sure, but a valid C test case still needs to exist to show that
>>>>>>> such a transformation is possible.  Until someone comes up with
>>>>>>> one it's all speculation.
>>>>>>
>>>>>> Jakub showed you one wrt CSE of addresses.
>>>>>
>>>>> Sorry, there have been so many examples I've lost track.  Can
>>>>> you please copy it here or point to it in the archive?
>>>>>
>>>>
>>>> This is based on Jakubs example here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html
>>>>
>>>> $ cat y.cc
>>>> typedef __typeof__ (sizeof 0) size_t;
>>>> void *operator new (size_t, void *p) { return p; }
>>>> void *operator new[] (size_t, void *p) { return p; }
>>>> struct S { int x; unsigned char a[1]; char b[64]; };
>>>> void baz (char *);
>>>>
>>>> size_t
>>>> foo (S *p)
>>>> {
>>>>    char *q = new ((char*)p->a) char [16];
>>>>    baz (q);
>>>>    size_t x = __builtin_strlen (q);
>>>>    if (x==0)
>>>>      __builtin_abort();
>>>>    return x;
>>>> }
>>>>
>>>> $ gcc -O3 -S y.ccup
>>>> $ cat y.s
>>>> .LFB2:
>>>> 	.cfi_startproc
>>>> 	subq	$8, %rsp
>>>> 	.cfi_def_cfa_offset 16
>>>> 	addq	$4, %rdi
>>>> 	call	_Z3bazPc
>>>> 	call	abort
>>>> 	.cfi_endproc
>>>>
>>>>
>>>>
>>>> I think this is not a correct optimization.
>>>
>>> I see.  This narrows it down to the placement new expression
>>> exposing the type of the original object rather than that of
>>> the newly constructed object.  We end up with strlen (_2)
>>> where _2 = &p_1(D)->a.
>>>
>>> The aggressive loop optimization trigger in this case because
>>> the access has been transformed to MEM[(char *)p_5(D) + 4B]
>>> early on which obviates the structure of the accessed type.
>>>
>>> This is the case that I think is worth solving -- ideally not
>>> by removing the optimization but by preserving the conversion
>>> to the type of the newly constructed object.
>>
>> Pointer types carry no information in GIMPLE.
>
>So what do you suggest as a solution?
>
>The strlen optimization can be decoupled from warnings and
>disabled, and the aggressive loop optimization can be disabled
>altogether.  But the same issue affects all string functions
>with _FORTIFY_SOURCE=2.  The modified example below aborts at
>runime (and gets diagnosed by -Wstringop-overflow).
>
>GCC certainly needs to generate valid object code for valid
>source code.  But keeping track of object type information
>is also important for correctness and security, as has been
>done by _FORTIFY_SOURCE and by many middle-end warnings.
>When accurate, it also benefits optimization.
>
>What can we do to make it more reliable?  Would annotating
>placement new solve the problem?  If not, what would?
>
>Martin
>
>#include <new>
>#include <string.h>
>
>struct S { int x; unsigned char a[1]; char b[64]; };
>
>void foo (S *p)
>{
>    char *q = new ((char*)p->a) char [16];
>
>    strcpy (q, "12345678");   // abort here

If fortify aborts here we have to fix it. 

We cannot annotate placement new since that is not necessary. All that is necessary according to the standard is reuse of storage. 

Object type information is stored at memory accesses in GIMPLE. A possibility to get more of that would be properly typed clobbers at object creation time (placement new). But certainly object reuse without those need to be treated conservatively. 

Note we arrived at the current state of things after a few painful transitions through non-working schemes, including having pointer types with 'semantics' and 'annotating' placement new.

Richard. 

>}
>
>int main ()
>{
>   foo (new S);
>}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]