This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix C++ strict-aliasing issues with memcpy folding


On Sun, 24 Jan 2010, Mark Mitchell wrote:

> Richard Guenther wrote:
> 
> > The evidence is 3.8/4 - 'A program may end the lifetime of any object
> > by reusing the storage which the object occupies ...'.  
> 
> I'm very suspicious of using the usual language-lawyering techniques in
> this context.  I'm not at all confident that the standard was drafted to
> take these issues into account.  I'm not even confident that the
> standard is self-consistent if you read it in this way.  I think these
> issues are qualitatively different from the "what overloaded function
> does this expression call?" kinds of cases where you can algorithmically
> follow the standard, even if the result may not strike most people as
> intuitive.
> 
> In that case, I don't think that "following the standard" is a useful
> thing to try to do.  "Fix the standard" might be useful, of course.
> But, as an implementor, I think we should do something sensible.  I'm
> very bothered by the idea that:
> 
>   int i;
>   int *p = &i;
>   // something
>   *p = 3;
> 
> might be invalid.  That seems very wrong.

Ok.

> > I can't seem to find any hint in the standard that static character
> > types are different in any way regarding dynamic types or
> > object lifetime.
> 
> I'm not saying that such a hint exists.  But, char is a special type in
> C/C++ for these kinds of purposes, since a char * can be used to access
> any type.  char * is special in that it's the type you can use to write
> a portable version of memcpy.  It's the closest thing we have to a
> generic "byte" type.  So, "static char heap[1024 * 1024]" followed by
> using placement new to create objects in that heap makes sense to me.
> But, if the type of heap is "array of struct X" or "array of double" I'm
> much less comfortable with this.

Ok.  Note that C does not allow arbitrary lvalues accessing an array
of chars.  memcpy does the opposite, which is explicitly allowed,
access any kind of object via lvalues of character type.

> > class Foo {
> >   _T1 m1;
> >   char storage[32];
> > } x, y;
> > 
> > _T2 *p = new (&x.storage) _T2;
> > *p = {};
> > 
> > thus, _partially_ re-use x.  I can't see how you can end the
> > lifetime of x.storage by re-using it without affecting the
> > lifetime of x itself.
> 
> > The middle-end sees the dynamic type of x.storage as being _T2.
> > The access via the lvalye x has type Foo.  Now what's the
> > dynamic type of x? 
> 
> Here's where I think char-is-special can save us.  I'd be much more
> bothered by this idiom if storage did not have type char.  But, since it
> does, I don't see this as particularly worse than using placement new on
> an array of char that is not part of a struct.  And, from an
> implementation point of view, we already need complexity around char;
> this seems like more such complexity, but of the same vein.

The patch below implements what is necessary to allow arrays of
character type to have any dynamic type (I excluded single chars
and the implementation isn't efficient yet).  It fixes the issue
as well.  The issue at hand is that the scheduler re-orders

    D.9757_19 = (struct do_truncate_float_t * &)&D.9760.D.8809._M_functor._M_pod_data[0];
(1) *D.9757_19 = D.9754_20;
(2) __old_functor = D.9760.D.8809._M_functor;

(1) and (2) because (1) accesses storage with effective type T1
(not a character type and not subject to the char exception) while
(2) accesses the storage via type T2 (subject to the char exception
because _M_functor contains an array of char).  With the fix below
D.9760 gets assigned alias-set zero, so the alias-set of T1 is a
subset of that of D.9760.

> Allowing this with arbitrary types just seems horrible.  If we think
> that we need to do that, I'd question whether TBAA is worth keeping.  If
> we think that C/C++ are not really a type-safe languages in this sense,
> then maybe we should give up on this optimization?  (And, yes, I do
> still remember who implemented it in the first place...)

It's certainly useful.  

I think the proper way is to honor trivial must-aliases first, like
the tree oracle does.  If you look at the RTL for (1) and (2):

(insn 19 18 74 2 
/abuild/rguenther/trunk-g/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/tr1/functional:1590 
(set (mem/f:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -56 [0xffffffffffffffc8])) [17 *D.9757_19+0 S4 
A32])
        (reg/f:SI 0 ax [orig:62 D.9753 ] [62])) 47 {*movsi_1} 
(expr_list:REG_DEAD (reg/f:SI 0 ax [orig:62 D.9753 ] [62])
        (nil)))

(insn 74 19 75 2 
/abuild/rguenther/trunk-g/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/tr1/functional:1907 
(set (reg:SI 4 si [orig:72 __old_functor ] [72])
        (mem/s/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -56 [0xffffffffffffffc8])) [0 
D.9760.D.8809._M_functor+0 S4 A256])) 47 {*movsi_1} (nil))

you can definitely see the must-alias from just looking at the
address expression and the mode.  Now the RTL oracle first does
TBAA disambiguation before it even considers trivial must-aliases
(in the offset-based disambiguation paths).

I can certainly try to change that (that would for example make
the following union hack unnecessary:

  /* Permit type-punning when accessing a union, provided the access
     is directly through the union.  For example, this code does not
     permit taking the address of a union member and then storing
     through it.  Even the type-punning allowed here is a GCC
     extension, albeit a common and useful one; the C standard says
     that such accesses have implementation-defined behavior.  */
  for (u = t;
       TREE_CODE (u) == COMPONENT_REF || TREE_CODE (u) == ARRAY_REF;
       u = TREE_OPERAND (u, 0))
    if (TREE_CODE (u) == COMPONENT_REF
        && TREE_CODE (TREE_TYPE (TREE_OPERAND (u, 0))) == UNION_TYPE)
      return 0;

because all accesses that this causes to conflict are trivial
must-aliases.

Anyway, the C++ patch allowing arbitrary-dynamically-typed char
arrays in structs is below.

I'm trying to sort-out the must-alias thing in the RTL oracle now.

Thanks,
Richard.

Index: gcc/cp/cp-objcp-common.c
===================================================================
*** gcc/cp/cp-objcp-common.c	(revision 156202)
--- gcc/cp/cp-objcp-common.c	(working copy)
*************** along with GCC; see the file COPYING3.
*** 33,38 ****
--- 33,62 ----
  #include "cxx-pretty-print.h"
  #include "cp-objcp-common.h"
  
+ /* Returns true if the aggregate type TYPE contains an array of
+    chars.  */
+ 
+ static bool
+ contains_array_of_chars (tree type)
+ {
+   tree f;
+ 
+   for (f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
+     {
+       if (TREE_CODE (f) != FIELD_DECL)
+ 	continue;
+ 
+       if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f))
+ 	  && contains_array_of_chars (TREE_TYPE (f)))
+ 	return true;
+       else if (TREE_CODE (TREE_TYPE (f)) == ARRAY_TYPE
+ 	       && TREE_TYPE (TREE_TYPE (f)) == char_type_node)
+ 	return true;
+     }
+ 
+   return false;
+ }
+ 
  /* Special routine to get the alias set for C++.  */
  
  alias_set_type
*************** cxx_get_alias_set (tree t)
*** 49,54 ****
--- 73,84 ----
  	  && TYPE_PTRMEMFUNC_P (TREE_TYPE (t))))
      return 0;
  
+   /* If t is a record or union type with a character array component assign
+      alias-set zero to it.  */
+   if (RECORD_OR_UNION_TYPE_P (t)
+       && contains_array_of_chars (t))
+     return 0;
+ 
    return c_common_get_alias_set (t);
  }
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]