This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


On 08/07/2018 11:46 AM, Richard Biener wrote:
On August 7, 2018 6:31:36 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
On 08/07/2018 09:33 AM, Bernd Edlinger wrote:
On 08/07/18 17:02, Martin Sebor wrote:
On 08/06/2018 11:45 PM, Richard Biener wrote:
On August 7, 2018 5:38:59 AM GMT+02:00, Martin Sebor
<msebor@gmail.com> wrote:
On 08/06/2018 11:40 AM, Jeff Law wrote:
On 08/06/2018 11:15 AM, Martin Sebor wrote:
These examples do not aim to be valid C, they just point out
limitations
of the middle-end design, and a good deal of the problems are
due
to trying to do things that are not safe within the boundaries
given
by the middle-end design.
I really think this is important -- and as such I think we need
to
move
away from trying to describe scenarios in C because doing so
keeps
bringing us back to the "C doesn't allow XYZ" kinds of
arguments
when
what we're really discussing are GIMPLE semantic issues.

So examples should be GIMPLE.  You might start with (possibly
invalid) C
code to generate the GIMPLE, but the actual discussion needs to
be
looking at GIMPLE.  We might include the C code in case someone
wants to
look at things in a debugger, but bringing the focus to GIMPLE
is
really
important here.

I don't understand the goal of this exercise.  Unless the GIMPLE
code is the result of a valid test case (in some language GCC
supports), what does it matter what it looks like?  The basis of
every single transformation done by a compiler is that the
source
code is correct.  If it isn't then all bets are off.  I'm no
GIMPLE
expert but even I can come up with any number of GIMPLE
expressions
that have undefined behavior.  What would that prove?
The GIMPLE IL is less restrictive than the original source
language.
The process of translation into GIMPLE and optimization can
create
situations in the GIMPLE IL that can't be validly represented in
the
original source language.  Subobject crossing being one such
case,
there
are certainly others.  We have to handle these scenarios
correctly.

Sure, but a valid C test case still needs to exist to show that
such a transformation is possible.  Until someone comes up with
one it's all speculation.

Jakub showed you one wrt CSE of addresses.

Sorry, there have been so many examples I've lost track.  Can
you please copy it here or point to it in the archive?


This is based on Jakubs example here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00260.html

$ cat y.cc
typedef __typeof__ (sizeof 0) size_t;
void *operator new (size_t, void *p) { return p; }
void *operator new[] (size_t, void *p) { return p; }
struct S { int x; unsigned char a[1]; char b[64]; };
void baz (char *);

size_t
foo (S *p)
{
   char *q = new ((char*)p->a) char [16];
   baz (q);
   size_t x = __builtin_strlen (q);
   if (x==0)
     __builtin_abort();
   return x;
}

$ gcc -O3 -S y.ccup
$ cat y.s
.LFB2:
	.cfi_startproc
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	addq	$4, %rdi
	call	_Z3bazPc
	call	abort
	.cfi_endproc



I think this is not a correct optimization.

I see.  This narrows it down to the placement new expression
exposing the type of the original object rather than that of
the newly constructed object.  We end up with strlen (_2)
where _2 = &p_1(D)->a.

The aggressive loop optimization trigger in this case because
the access has been transformed to MEM[(char *)p_5(D) + 4B]
early on which obviates the structure of the accessed type.

This is the case that I think is worth solving -- ideally not
by removing the optimization but by preserving the conversion
to the type of the newly constructed object.

Pointer types carry no information in GIMPLE.

So what do you suggest as a solution?

The strlen optimization can be decoupled from warnings and
disabled, and the aggressive loop optimization can be disabled
altogether.  But the same issue affects all string functions
with _FORTIFY_SOURCE=2.  The modified example below aborts at
runime (and gets diagnosed by -Wstringop-overflow).

GCC certainly needs to generate valid object code for valid
source code.  But keeping track of object type information
is also important for correctness and security, as has been
done by _FORTIFY_SOURCE and by many middle-end warnings.
When accurate, it also benefits optimization.

What can we do to make it more reliable?  Would annotating
placement new solve the problem?  If not, what would?

Martin

#include <new>
#include <string.h>

struct S { int x; unsigned char a[1]; char b[64]; };

void foo (S *p)
{
   char *q = new ((char*)p->a) char [16];

   strcpy (q, "12345678");   // abort here
}

int main ()
{
  foo (new S);
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]