Bug 20607 - [3.4 Regression] -fstrict-aliasing causes incorrect scheduling
Summary: [3.4 Regression] -fstrict-aliasing causes incorrect scheduling
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.4.3
: P2 normal
Target Milestone: 4.0.0
Assignee: Mark Mitchell
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2005-03-23 19:03 UTC by Ganesh Sittampalam
Modified: 2005-05-10 03:17 UTC (History)
2 users (show)

See Also:
Host: sparc-sun-solaris2.9
Target: sparc-sun-solaris2.9
Build: sparc-sun-solaris2.9
Known to work: 2.95 4.0.0 4.1.0
Known to fail: 3.3.2 3.4.3 3.4.4
Last reconfirmed: 2005-03-29 19:19:23


Attachments
Source code for bug (442 bytes, text/plain)
2005-03-23 19:04 UTC, Ganesh Sittampalam
Details
gcc -v -save-temps output for bug (360 bytes, text/plain)
2005-03-23 19:05 UTC, Ganesh Sittampalam
Details
Proposed patch. (673 bytes, patch)
2005-05-10 03:12 UTC, Mark Mitchell
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ganesh Sittampalam 2005-03-23 19:03:54 UTC
On Sparc Solaris, the attached code leads to the following assembly sequence
being produced at the beginning of a function. Stack slot %fp-32 is read before
it is written. Turning off either -fstrict-aliasing or -fsched-insns causes the
problem to go away.

I'm relying on the GCC extension that allows type punning via unions, and
assuming a big endian layout of all the datatypes. As specified by the
-fstrict-aliasing documentation, I'm copying the value out of the union rather
than taking an address of a union member.

   0:   9d e3 bf 80     save  %sp, -128, %sp
   4:   d6 07 bf e4     ld  [ %fp + -28 ], %o3
   8:   c2 07 bf e0     ld  [ %fp + -32 ], %g1
   c:   c2 27 bf e8     st  %g1, [ %fp + -24 ]
  10:   19 00 00 20     sethi  %hi(0x8000), %o4
  14:   1b 3f ff ff     sethi  %hi(0xfffffc00), %o5
  18:   9a 13 63 ff     or  %o5, 0x3ff, %o5     ! ffffffff <main+0xffffffa3>
  1c:   d8 3f bf e0     std  %o4, [ %fp + -32 ]

g++-3_4_3 -v -save-temps -O1 -c -o u64solrs.o u64solrs.c -fschedule-insns
-fstrict-aliasing

Reading specs from
/arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/specs
Configured with: ./configure --prefix=/arm/eda/tools/gnu/gcc/3_4_3
--exec-prefix=/arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc
--program-suffix=-3_4_3 --enable-languages=c,c++ -v --with-dwarf2
--enable-version-specific-runtime-libs --with-gnu-as
--with-as=/arm/eda/tools/gnu/binutils/2_15/solaris_2_9-sparc/bin/as-2_15
--with-gnu-ld
--with-ld=/arm/eda/tools/gnu/binutils/2_15/solaris_2_9-sparc/bin/ld-2_15
Thread model: posix
gcc version 3.4.3
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/libexec/gcc/sparc-sun-solaris2.9/3.4.3/cc1plus
-E -quiet -v u64solrs.c -mcpu=v7 -fschedule-insns -fstrict-aliasing -O1 -o
u64solrs.ii
ignoring nonexistent directory
"/arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/../../../../sparc-sun-solaris2.9/include"
#include "..." search starts here:
#include <...> search starts here:
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/include/c++
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/include/c++/sparc-sun-solaris2.9
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/include/c++/backward
 /usr/local/include
 /arm/eda/tools/gnu/gcc/3_4_3/include
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/lib/gcc/sparc-sun-solaris2.9/3.4.3/include
 /usr/include
End of search list.
 /arm/eda/tools/gnu/gcc/3_4_3/solaris_2_9-sparc/libexec/gcc/sparc-sun-solaris2.9/3.4.3/cc1plus
-fpreprocessed u64solrs.ii -quiet -dumpbase u64solrs.c -mcpu=v7 -auxbase-strip
u64solrs.o -O1 -version -fschedule-insns -fstrict-aliasing -o u64solrs.s
GNU C++ version 3.4.3 (sparc-sun-solaris2.9)
        compiled by GNU C version 3.4.3.
GGC heuristics: --param ggc-min-expand=65 --param ggc-min-heapsize=65536
 /arm/eda/tools/gnu/binutils/2_15/solaris_2_9-sparc/bin/as-2_15 -V -Qy -s
-xarch=v8 -o u64solrs.o u64solrs.s
GNU assembler version 2.15 (sparc-sun-solaris2.9) using BFD version 2.15
Comment 1 Ganesh Sittampalam 2005-03-23 19:04:40 UTC
Created attachment 8436 [details]
Source code for bug
Comment 2 Ganesh Sittampalam 2005-03-23 19:05:36 UTC
Created attachment 8437 [details]
gcc -v -save-temps output for bug
Comment 3 Eric Botcazou 2005-03-29 19:19:22 UTC
Confirmed in 3.4.x, probably another incarnation of the infamous stack slot
sharing problem in 3.x when type-based aliasing is enabled.

As as side note, -fschedule-insns makes little sense without -mcpu=xxx on SPARC.
Comment 4 Eric Botcazou 2005-03-29 19:43:17 UTC
Investigating.
Comment 5 Eric Botcazou 2005-03-30 10:53:18 UTC
Hum... type-punning simply doesn't work in this case with the C++ compiler, but
does work with the C compiler.  The problem is that:

	union u {
	    x_uint64_t first;
	    x_uint32_t second;
	};
	union u conv;
	conv.first = arg1;
	result = conv.second;

is translated into:

  *(&conv.first) = *(&arg1);
  *(&result) = *(&conv.second);

and type-punning cannot work in this case (that's the counter-example provided
in the manual).  I guess the problem comes the fields being of aggregate types.

Mark, is that fixable or should we document the limitation?
Comment 6 Mark Mitchell 2005-03-31 00:54:13 UTC
Subject: Re:  [3.4 Regression] -fstrict-aliasing causes incorrect
 scheduling

ebotcazou at gcc dot gnu dot org wrote:
> ------- Additional Comments From ebotcazou at gcc dot gnu dot org  2005-03-30 10:53 -------
> Hum... type-punning simply doesn't work in this case with the C++ compiler, but
> does work with the C compiler.  The problem is that:
> 
> 	union u {
> 	    x_uint64_t first;
> 	    x_uint32_t second;
> 	};
> 	union u conv;
> 	conv.first = arg1;
> 	result = conv.second;
> 
> is translated into:
> 
>   *(&conv.first) = *(&arg1);
>   *(&result) = *(&conv.second);
> 
> and type-punning cannot work in this case (that's the counter-example provided
> in the manual).  I guess the problem comes the fields being of aggregate types.
> 
> Mark, is that fixable or should we document the limitation?

I think it's fixable.  I'm not sure exactly why the C++ front-end is 
inserting the additional operations, but they're certainly not essential 
from a C++ semantics point of view.  Please reassign the bug to me; I'll 
try to take a look.

Thanks,

Comment 7 Eric Botcazou 2005-03-31 06:30:36 UTC
Subject: Re:  [3.4 Regression] -fstrict-aliasing causes incorrect scheduling

> I think it's fixable.  I'm not sure exactly why the C++ front-end is
> inserting the additional operations, but they're certainly not essential
> from a C++ semantics point of view.  Please reassign the bug to me; I'll
> try to take a look.

Thanks, will do.  For the records, here's the tree for the first assignment 
with the C compiler:

 <modify_expr 0x40186390
    type <record_type 0x40220074 x_uint64_t type_0 DI
        size <integer_cst 0x401797c0 constant 64>
        unit size <integer_cst 0x401799e0 constant 8>
        align 64 symtab 0 alias set -1
        fields <field_decl 0x4021ef68 a type <array_type 0x4021eef4>
            DI file ../pr20607.c line 17 size <integer_cst 0x401797c0 64> 
unit size <integer_cst 0x401799e0 8>
            align 64 offset_align 64
            offset <integer_cst 0x40179ac0 constant 0>
            bit offset <integer_cst 0x40179b80 constant 0> context 
<record_type 0x4021ed24 __x_uint64_t> arguments <integer_cst 0x40179ac0 0>> 
context <translation_unit_decl 0x4017c15c>>
    side-effects
    arg 0 <component_ref 0x40186378 type <record_type 0x40220074 x_uint64_t>

        arg 0 <var_decl 0x402205e4 conv type <union_type 0x40220414 u>
            used BLK file ../pr20607.c line 31 size <integer_cst 0x401797c0 
64> unit size <integer_cst 0x401799e0 8>
            align 64 context <function_decl 0x4022015c dotests_1>
            (mem/s:BLK (plus:SI (reg/f:SI 103 virtual-stack-vars)
        (const_int -16 [0xfffffffffffffff0])) [8 conv+0 S8 A64])>
        arg 1 <field_decl 0x402204fc first type <record_type 0x40220074 
x_uint64_t>
            DI file ../pr20607.c line 28 size <integer_cst 0x401797c0 64> 
unit size <integer_cst 0x401799e0 8>
            align 64 offset_align 64 offset <integer_cst 0x40179ac0 0> bit 
offset <integer_cst 0x40179b80 0> context <union_type 0x40220414 u> 
arguments <integer_cst 0x40179ac0 0> chain <field_decl 0x40220570 second>>>
    arg 1 <var_decl 0x4022032c arg1 type <record_type 0x40220074 x_uint64_t>
        used DI file ../pr20607.c line 23 size <integer_cst 0x401797c0 64> 
unit size <integer_cst 0x401799e0 8>
        align 64 context <function_decl 0x4022015c dotests_1>
        (reg/v:DI 107 [ arg1 ])
        chain <var_decl 0x402203a0 result type <record_type 0x4021ecb0 
x_uint32_t>
            used BLK file ../pr20607.c line 24 size <integer_cst 0x401797c0 
64> unit size <integer_cst 0x401799e0 8>
            align 64 context <function_decl 0x4022015c dotests_1>
            (mem/s:BLK (plus:SI (reg/f:SI 103 virtual-stack-vars)
        (const_int -8 [0xfffffffffffffff8])) [2 result+0 S8 A64])>>>

and with the C++ compiler:

 <modify_expr 0x40188510
    type <record_type 0x40255000 __x_uint64_t type_1 type_5 DI
        size <integer_cst 0x401797c0 constant 64>
        unit size <integer_cst 0x401799e0 constant 8>
        align 64 symtab 0 alias set 2
        fields <field_decl 0x402553a0 a type <array_type 0x4025532c>
            used nonlocal decl_3 DI file ../pr20607.C line 17 size 
<integer_cst 0x401797c0 64> unit size <integer_cst 0x401799e0 8>
            align 64 offset_align 64
            offset <integer_cst 0x40179ac0 constant 0>
            bit offset <integer_cst 0x40179b80 constant 0> context 
<record_type 0x40255000 __x_uint64_t> arguments <integer_cst 0x40179ac0 0> 
chain <type_decl 0x402551d0 __x_uint64_t>>
        X() X(constX&) this=(X&) n_parents 0 use_template=0 interface-unknown
        member-functions <tree_vec 0x40253f40
            elt 0 <overload 0x402510f0>
            elt 2 <function_decl 0x40255cb0 operator=>
            elt 3 <overload 0x40251130>
            elt 4 <overload 0x40251110>>
        pointer_to_this <pointer_type 0x402550e8> reference_to_this 
<reference_type 0x40255a6c> chain <type_decl 0x4025515c __x_uint64_t>>
    side-effects
    arg 0 <indirect_ref 0x402586cc type <record_type 0x40255000 __x_uint64_t>

        arg 0 <nop_expr 0x402586b8 type <pointer_type 0x402550e8>

            arg 0 <nop_expr 0x402586a4 type <pointer_type 0x402550e8>

                arg 0 <non_lvalue_expr 0x40258690 type <pointer_type 
0x402599f8>

                    arg 0 <nop_expr 0x4025867c type <pointer_type 0x402599f8>

                        arg 0 <addr_expr 0x40258514 type <pointer_type 
0x4025789c>
                            arg 0 <var_decl 0x40259984 conv>>>>>>>
    arg 1 <indirect_ref 0x402586e0
        type <record_type 0x402556cc __x_uint64_t readonly type_1 type_5 DI 
size <integer_cst 0x401797c0 64> unit size <integer_cst 0x401799e0 8>
            align 64 symtab 0 alias set -1 fields <field_decl 0x402553a0 a>
            X() X(constX&) this=(X&) n_parents 0 use_template=0 
interface-unknown member-functions <tree_vec 0x40253f40>
            pointer_to_this <pointer_type 0x40259a6c> reference_to_this 
<reference_type 0x40255740>>
        readonly
        arg 0 <nop_expr 0x40258640 type <reference_type 0x40255740>

            arg 0 <nop_expr 0x4025862c type <pointer_type 0x40259a6c>

                arg 0 <addr_expr 0x40258618 type <pointer_type 0x402599f8>
                    arg 0 <var_decl 0x40257658 arg1>>>>>>


Type punning needs a COMPONENT_REF to work (see c_common_get_alias_set).

Comment 8 Mark Mitchell 2005-05-10 02:49:20 UTC
The reason that this happens when the type assigned is a class types is that
those types have overloaded operator=.  There's always an implicit operator=,
even if one is not declared explicitly, with a "this" argument of type "T*". 
So, we take the address of "conv.first" in preparation for passing it to the
function.  Now, in build_unary_op, we do this bit:

            tree field = TREE_OPERAND (arg, 1);
            tree rval = build_unary_op (ADDR_EXPR, TREE_OPERAND (arg, 0), 0);
            tree binfo = lookup_base (TREE_TYPE (TREE_TYPE (rval)),
                                      decl_type_context (field),
                                      ba_check, NULL);

            rval = build_base_path (PLUS_EXPR, rval, binfo, 1);
=>          rval = build_nop (argtype, rval);
            addr = fold (build (PLUS_EXPR, argtype, rval,
                                cp_convert (argtype, byte_position (field))));

which destroys the COMPONENT_REF.

This is rather silly.  In more current compilers, we have actual FIELDs for the
base classes, and we should just be chasing through them.  Certainly, when the
class containing the field and the class of the object are the same, there's no
need for any of this.
Comment 9 Mark Mitchell 2005-05-10 03:12:12 UTC
Created attachment 8849 [details]
Proposed patch.

Eric --

If you would, could you try this patch and see if it fixes the problem?  It
avoids the COMPONENT_REF collapse.  I'm running tests on
x86_64-unknown-linux-gnu now.

Thanks,

-- Mark
Comment 10 Mark Mitchell 2005-05-10 03:16:19 UTC
Eric --

Never mind; test results aren't looking good.  It looks like those comments
about offsetof were there for a reason.  I'm still going to try applying the
patch to mainline -- but I don't think there's going to be a good solution for
3.4.x.

-- Mark
Comment 11 Mark Mitchell 2005-05-10 03:17:12 UTC
Fixed in 4.0; will not be fixed in 3.4.x.