Bug 20408 - Unnecessary code generated for empty structs
Summary: Unnecessary code generated for empty structs
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 21796
  Show dependency treegraph
 
Reported: 2005-03-10 15:48 UTC by Chris Jefferson
Modified: 2012-02-04 02:44 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-02-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Jefferson 2005-03-10 15:48:15 UTC
Given an empty struct (ie struct X { };), even at high optimisation levels g++
will insist on always allocating and zeroing some memory for empty structs.

This actually effects C++ code, including libstdc++-v3, as empty structs are
often used as a means of passing around functions.

EXAMPLE
------------------
struct X {};

void foo(X);

void call_foo()
{ foo(X()); }
------------------

generates (from -O3, t70.final_cleanup is:)
-------------------------------------------------
;; Function void call_foo() (_Z8call_foov)

void call_foo() ()
{
  struct X D.1597;

<bb 0>:
  D.1597 = 0;
  foo (D.1597) [tail call];
  return;

}
---------------------------------------------
Comment 1 Andrew Pinski 2005-03-10 15:56:21 UTC
Note this has no effect on the generated code.

But anyways:
  struct X D.1574;
  struct X D.1590;

  D.1574 = {};
  D.1590 = 0;

That is not right 0 is an INTEREGER_CST which cannot be an aggregate, that is just wrong.

If we had more type checking we would be an ICE.
Comment 2 Chris Jefferson 2005-03-10 16:00:18 UTC
ignore my random changing of severity..
Comment 3 Chris Jefferson 2005-03-10 16:05:39 UTC
When you say "has no effect in final code", are you talking about the problem
you noticed, or the problem as a whole?

I find for each extra X I add to the type of foo I get a line much like:

movb %al, 28(%esp)

appearing in the assembler output, which I assume isn't necessary..
Comment 4 Andrew Pinski 2005-03-10 16:10:10 UTC
Werid on PPC, we get the most optimal code of
__Z8call_foov:
LFB2:
        b __Z3foo1X
Comment 5 Paolo Carlini 2005-03-10 16:19:10 UTC
Andrew, is Daniel Berlin struct aliasing work likely to help, here?
Comment 6 Andrew Pinski 2005-03-10 16:21:49 UTC
(In reply to comment #5)
> Andrew, is Daniel Berlin struct aliasing work likely to help, here?

No, fixing the front-end so it no longer produces a = 0 will most likely fix this as that is just wrong.
Comment 7 Andrew Pinski 2005-03-10 17:06:02 UTC
From call.c:
  /* Don't pass empty class objects by value.  This is useful
     for tags in STL, which are used to control overload resolution.
     We don't need to handle other cases of copying empty classes.  */
  if (! decl || ! DECL_BUILT_IN (decl))
    for (tmp = parms; tmp; tmp = TREE_CHAIN (tmp))
      if (is_empty_class (TREE_TYPE (TREE_VALUE (tmp)))
	  && ! TREE_ADDRESSABLE (TREE_TYPE (TREE_VALUE (tmp))))
	{
	  tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (TREE_VALUE (tmp)));
	  TREE_VALUE (tmp) = build2 (COMPOUND_EXPR, TREE_TYPE (t),
				     TREE_VALUE (tmp), t);
	}
Comment 8 Andrew Pinski 2005-03-10 17:07:18 UTC
From cp-gimplifier.c:
    case EMPTY_CLASS_EXPR:
      /* We create an INTEGER_CST with RECORD_TYPE and value zero.  */
      *expr_p = build_int_cst (TREE_TYPE (*expr_p), 0);

This is wrong.
Comment 9 Paolo Carlini 2005-03-10 17:13:48 UTC
Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR
similarly to USING_STMT? ;) ;)
Comment 10 Andrew Pinski 2005-03-10 17:15:43 UTC
(In reply to comment #9)
> Eh, I was looking at the very same code... Can't we deal with EMPTY_CLASS_EXPR
> similarly to USING_STMT? ;) ;)

I think we should produce an empty CONSTRUCTOR (which I am testing right now).
Comment 11 Andrew Pinski 2005-03-10 17:30:52 UTC
(In reply to comment #10)
> I think we should produce an empty CONSTRUCTOR (which I am testing right now).
That did not fix it, we still get code generated for the "empty" struct:
;; D.1594 = {}
(nil)

;; foo (D.1594) [tail call]
(insn 10 8 11 (set (mem:QI (reg/f:SI 56 virtual-outgoing-args) [0 S1 A32])
        (reg:QI 58 [ D.1594 ])) -1 (nil)
    (nil))

(call_insn 11 10 0 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41e9ea6c 
foo>) [0 S1 A8])
        (const_int 4 [0x4])) -1 (nil)
    (nil)
    (nil))


But on PPC we get:
;; D.1588 = {}
(nil)

;; foo (D.1588) [tail call]
(insn 10 8 11 0 (set (reg:QI 3 r3)
        (reg:QI 118 [ D.1588 ])) -1 (nil)
    (nil))

(call_insn/j 11 10 12 0 (parallel [
            (call (mem:SI (symbol_ref:SI ("_Z3foo1X") [flags 0x41] <function_decl 0x41d9f984 foo>) [0 S4 
A8])
                (const_int 32 [0x20]))
            (use (const_int 0 [0x0]))
            (use (reg:SI 119))
            (return)
        ]) -1 (nil)
    (nil)
    (expr_list:REG_DEP_TRUE (use (reg:QI 3 r3))
        (nil)))


See the difference is that we pass on ppc via a register but on x86 we pass via the stack.  I don't know a 
way to fix this with a front-end change.
Comment 12 Andrew Pinski 2005-03-10 17:35:16 UTC
The change to cp-gimplifier.c should still happen as it makes not create a INTEGER_CST for an 
aggregate.
Comment 13 Richard Biener 2005-09-04 16:25:55 UTC
For

struct Foo {};
void foo(const Foo&);
void bar(Foo);

void fooc(void)
{
        foo(Foo());
}
void barc(void)
{
        bar(Foo());
}

we get different initializers for the Foo& and the Foo case:

void fooc() ()
{
  struct Foo D.1594;

<bb 0>:
  D.1594 = {};
  foo (&D.1594);
  return;

}

void barc() ()
{
  struct Foo D.1613;

<bb 0>:
  D.1613 = 0;
  bar (D.1613) [tail call];
  return;

}

The former looks correct and does not produce initialization code
for the temporary.  The latter produces an unneccessary (uninitialized)
initialization of the pass-by-value stack slot on x86.
Comment 14 Richard Biener 2005-09-04 16:37:28 UTC
pinskia posted a patch for the =0 "bug"

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01054.html
Comment 15 Richard Biener 2005-09-04 17:12:03 UTC
I will look at this.  Is this by any chance a regression?
Comment 16 Richard Biener 2005-09-05 10:36:31 UTC
Pinskias patch fixes one part of the problem.  For x86 there remains the
issue that we are passing X on the stack and generate

_Z8call_foov:
.LFB2:
        subl    $26, %esp
.LCFI4:
        pushw   %ax
.LCFI5:
.LCFI6:
        call    _Z3foo1X
        addl    $28, %esp
        ret

which is correct, but I wonder where we figure out to use %ax as source
for the tmp X.  Also no RTL optimizer sees that the pushw %ax can be
safely combined with the stack adjust before because the contents of %ax
are unknown and we don't care about what value we pass on the stack.

The call expander produces

(insn 9 8 10 1 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe]))) [0 S1 A8])
        (reg:QI 58 [ D.1755 ])) -1 (nil)
    (nil))

(call_insn 11 10 12 1 (call (mem:QI (symbol_ref:SI ("_Z3foo1X") [flags 0x41]
<function_decl 0x40230e00 foo>) [0 S1 A8])
        (const_int 16 [0x10])) -1 (nil)
    (nil)
    (nil))

where it should not use reg 58 as the source to push, but rather
the stack slot we assigned to the D.1755 tmp.
Comment 17 Richard Biener 2005-09-05 10:48:43 UTC
Unfortunately we start with D.1755 allocated to a register.  This may be solved
at the tree level if we fix PR23372.  Or we need to be smarter at allocating
space for D.1755.
Comment 18 Richard Biener 2005-09-05 12:38:06 UTC
Life analysis should figure out, that for

(insn 10 9 11 1 (set (mem/s:QI (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe]))) [0 S1 A8])
        (reg:QI 58 [ D.1755 ])) -1 (nil)
    (nil))

where it notes that reg:QI 58 is dead after the instruction, never
became life before and so remove the set completely, only preserving
the side-effects 

                       (pre_modify:SI (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -2 [0xfffffffe])))

someone familiar with flow.c should be able to hack this into
mark_used_reg() in a few minutes :)
Comment 19 Steven Bosscher 2007-11-26 14:17:01 UTC
The recent improvements to the dataflow module and ra-conflicts may have fixed this.  Richi, you were the last to look at this bug report.  Can you check if there still is an issue to fix here?
Comment 20 Richard Biener 2007-11-27 09:40:15 UTC
For the testcase in comment #13 we now generate two(!) temporaries:

void barc() ()
{
  struct Foo D.2027;
  struct Foo D.2028;

  D.2027 = {};
  D.2028 = {};
  bar (D.2028);
}

via gimplification of

;; Function void barc() (_Z4barcv)
;; enabled by -tree-original

<<cleanup_point <<< Unknown tree: expr_stmt
  bar (TARGET_EXPR <D.2027, {}>;, <<< Unknown tree: empty_class_expr >>>
;) >>>
>>;

though the generated assembler looks like we cannot do better:

_Z4barcv:
.LFB3:
        subq    $8, %rsp
.LCFI0:
        movb    $0, (%rsp)
        call    _Z3bar3Foo
        addq    $8, %rsp
        ret

_Z4foocv:
.LFB2:
        subq    $24, %rsp
.LCFI1:
        leaq    23(%rsp), %rdi
        call    _Z3fooRK3Foo
        addq    $24, %rsp
        ret

On the tree level the first simple DSE pass gets rid of the extra temporary
and its initialization.