Bug 23524 - [4.1 Regression]bigger version of mov + cmp produced
Summary: [4.1 Regression]bigger version of mov + cmp produced
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 minor
Target Milestone: 4.1.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 23153
  Show dependency treegraph
 
Reported: 2005-08-23 05:53 UTC by Dan Nicolaescu
Modified: 2005-11-03 22:32 UTC (History)
1 user (show)

See Also:
Host:
Target: i686-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-10-27 17:35:05


Attachments
Testcase for this bug (63.37 KB, text/plain)
2005-08-23 05:54 UTC, Dan Nicolaescu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2005-08-23 05:53:46 UTC
Compiling the code in the attachment using -O2 -march=i686 4.1 produces bigger 
code than 4.0. 
4.0 produces:         
   movl    xtermWidgetClass, %eax
   cmpl    %eax, 4(%edx)
4.1 produces: 
   movl    4(%edx), %eax
   cmpl    xtermWidgetClass, %eax

the 4.1 version is 1 byte bigger.

This is one of the reasons for the code size increase in PR23153.
Comment 1 Dan Nicolaescu 2005-08-23 05:55:00 UTC
Created attachment 9560 [details]
Testcase for this bug
Comment 2 Andrew Pinski 2005-08-23 11:27:25 UTC
You really should know that we only care about code size at -Os.  We care about performance 
regressions though at -O2.
Comment 3 Dan Nicolaescu 2005-08-23 18:05:59 UTC
(In reply to comment #2)
> You really should know that we only care about code size at -Os.  We care
about performance 
> regressions though at -O2.

Code size is important for performance for modern processors. Small I-cache (and
I-TLB) footprint for otherwise equivalent code results in better performance.

BTW, this is a 4.1 regression. 
Comment 4 Andrew Pinski 2005-08-23 18:07:57 UTC
Subject: Re:  bigger version of mov + cmp produced


On Aug 23, 2005, at 2:06 PM, dann at godzilla dot ics dot uci dot edu 
wrote:

>
> ------- Additional Comments From dann at godzilla dot ics dot uci dot 
> edu  2005-08-23 18:05 -------
> (In reply to comment #2)
>> You really should know that we only care about code size at -Os.  We 
>> care
> about performance
>> regressions though at -O2.
>
> Code size is important for performance for modern processors. Small 
> I-cache (and
> I-TLB) footprint for otherwise equivalent code results in better 
> performance.

Then use -Os every where instead.  You will see that the overall code 
size for 4.1
has reduced from 4.0.

-- Pinski

Comment 5 Dan Nicolaescu 2005-08-23 18:15:18 UTC
(In reply to comment #4)
>
> Then use -Os every where instead.  You will see that the overall code 
> size for 4.1
> has reduced from 4.0.

That might be true, but -Os is not always an option. If there's a good reason
for -O2 to generate bigger code, then so be it, but that does not seem to be the
case for the code in this PR (at least AFAICT).
Comment 6 Dan Nicolaescu 2005-09-07 22:05:45 UTC
It seems that expand generates different insns in 4.0 and 4.1 for the comparison
in question:

4.0 generates: (from .00.expand)

(insn 15 13 16 1 (set (reg/f:SI 62)
        (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
                (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32]))
-1 (nil)
    (nil))

(insn 16 15 17 1 (set (reg/f:SI 63)
        (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] <var_decl
0xb7a8b0d8 x
termWidgetClass>) [5 xtermWidgetClass+0 S4 A32])) -1 (nil)
    (nil))

(insn 17 16 18 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/f:SI 62)
            (reg/f:SI 63))) -1 (nil)
    (nil))

4.1 generates: 

(insn 15 13 16 1 (set (reg:SI 62)
        (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
                (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32]))
-1 (nil)
    (nil))

(insn 16 15 17 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 62)
            (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40]
<var_decl 0xb7b510
b0 xtermWidgetClass>) [5 xtermWidgetClass+0 S4 A32]))) -1 (nil)
    (nil))

Comment 7 Steven Bosscher 2005-10-27 16:26:33 UTC
Could the dear reported at least try to provide a small test case?

I think this should not be marked as a regression.  It's just sad that this kind of non-bug keeps the regression count high, when in reality GCC 4.1 produces smaller code overall.
 
Comment 8 Dan Nicolaescu 2005-10-27 16:43:37 UTC
(In reply to comment #7)
> Could the dear reported at least try to provide a small test case?

The testcase in the attachment contains only a 4 lines function: 
HandleDeIconify, the rest is just fluff to allow it to compile. Granted a lot of it can be pruned, but I don't think it stops trying to debug the problem.


> I think this should not be marked as a regression.  

Why not? It is a regression. 

> It's just sad that this
> kind of non-bug keeps the regression count high, when in reality GCC 4.1
> produces smaller code overall.

PR23153 tells a completely different story about codesize (at least for i686).
Comment 9 Steven Bosscher 2005-10-27 17:08:20 UTC
And CSiBE tells you the story that GCC 4.1 produces smaller code overall.
http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux

So do the SPEC benchmark boxes btw.
Comment 10 Steven Bosscher 2005-10-27 17:31:39 UTC
For the record, we're talking about:

   1                            .file   "t.c"
   2                            .text
   3                            .p2align 4,,15
   4                    .globl foo
   5                            .type   foo, @function
   6                    foo:
   7 0000 55                    pushl   %ebp
   8 0001 A1000000              movl    x, %eax
   8      00
   9 0006 89E5                  movl    %esp, %ebp
  10 0008 8B5508                movl    8(%ebp), %edx
  11 000b 394204                cmpl    %eax, 4(%edx)
  12 000e 7402                  je      .L4
  13 0010 5D                    popl    %ebp
  14 0011 C3                    ret
  15                            .p2align 4,,7
  16                    .L4:
  17 0012 5D                    popl    %ebp
  18 0013 E9FCFFFF              jmp     bar
  18      FF
  19                            .size   foo, .-foo
  20                            .comm   x,4,4
  21                            .section        .note.GNU-stack,"",@progbits
  22                            .ident  "GCC: (GNU) 3.3.4 (pre 3.3.5 20040809)"




vs.


   1                            .file   "t.c"
   2                            .text
   3                            .p2align 4,,15
   4                    .globl foo
   5                            .type   foo, @function
   6                    foo:
   7 0000 55                    pushl   %ebp
   8 0001 89E5                  movl    %esp, %ebp
   9 0003 8B5508                movl    8(%ebp), %edx
  10 0006 8B4204                movl    4(%edx), %eax
  11 0009 3B050000              cmpl    x, %eax
  11      0000
  12 000f 7402                  je      .L6
  13 0011 5D                    popl    %ebp
  14 0012 C3                    ret
  15                            .p2align 4,,7
  16                    .L6:
  17 0013 5D                    popl    %ebp
  18 0014 E9FCFFFF              jmp     bar
  18      FF
  19                            .size   foo, .-foo
  20                            .comm   x,4,4
  21                            .ident  "GCC: (GNU) 4.1.0 20051026 (experimental)"
  22                            .section        .note.GNU-stack,"",@progbits
Comment 11 Steven Bosscher 2005-10-27 17:34:42 UTC
And FWIW there is also a problem with this insn, the length is wrong:

#(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
#        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
#    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
#        (nil)))
A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]

Comment 12 Dan Nicolaescu 2005-10-27 18:08:22 UTC
(In reply to comment #9)
> And CSiBE tells you the story that GCC 4.1 produces smaller code overall.
> http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux

Well, it obviously depends on applications.
The point of PR23153 is to show that there is a code size regression, and all the PRs that depend on it are showing very specific issues that cause a part of the regression.

A more interesting test would be to see the Linux kernel size difference, because if there's any difference there would be some people screaming (unfortunately I won't be able to do that comparison anytime soon, hope someone else will).
Comment 13 Dan Nicolaescu 2005-10-31 04:50:34 UTC
(In reply to comment #12)
> A more interesting test would be to see the Linux kernel size difference,

There's such a comparison now in comment #8 in PR23153. It confirms the size increase.
Comment 14 Mark Mitchell 2005-10-31 05:36:20 UTC
I don't think this PR is, in-and-of-itself, is very interesting, as it's a 1-byte size increase with -O2, which, as has been said, is not aimed at minimizing code size.  So, I'm going to close this PR -- but, leave PR 23153 open, as there look to be interesting issues there.
Comment 15 Uroš Bizjak 2005-11-03 06:29:52 UTC
(In reply to comment #11)
> And FWIW there is also a problem with this insn, the length is wrong:
> 
> #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
> #        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
> #    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
> #        (nil)))
> A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]

FYI: This problem is addressed in patch at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.

Comment 16 Dan Nicolaescu 2005-11-03 06:42:52 UTC
(In reply to comment #15)
> (In reply to comment #11)
> > And FWIW there is also a problem with this insn, the length is wrong:
> > 
> > #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
> > #        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
> > #    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
> > #        (nil)))
> > A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]
> 
> FYI: This problem is addressed in patch at
> http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.

Do you know if your patch also fixes this PR?
Comment 17 Uroš Bizjak 2005-11-03 10:05:30 UTC
> > FYI: This problem is addressed in patch at
> > http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.
> 
> Do you know if your patch also fixes this PR?

  Unfortunatelly no... I don't know if insn size is considered during reload.

Comment 18 Jan Hubicka 2005-11-03 22:32:26 UTC
insn size is currently completely unused.  It was used for producing loop instructions.
Honza