23524 – [4.1 Regression]bigger version of mov + cmp produced

Bug 23524 - [4.1 Regression]bigger version of mov + cmp produced

Summary: [4.1 Regression]bigger version of mov + cmp produced

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.1.0

Importance:	P2 minor
Target Milestone:	4.1.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	23153
	Show dependency tree / graph

Reported:	2005-08-23 05:53 UTC by Dan Nicolaescu
Modified:	2005-11-03 22:32 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	i686-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-10-27 17:35:05

Attachments
Testcase for this bug (63.37 KB, text/plain) 2005-08-23 05:54 UTC, Dan Nicolaescu	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dan Nicolaescu 2005-08-23 05:53:46 UTC

Compiling the code in the attachment using -O2 -march=i686 4.1 produces bigger 
code than 4.0. 
4.0 produces:         
   movl    xtermWidgetClass, %eax
   cmpl    %eax, 4(%edx)
4.1 produces: 
   movl    4(%edx), %eax
   cmpl    xtermWidgetClass, %eax

the 4.1 version is 1 byte bigger.

This is one of the reasons for the code size increase in PR23153.

Comment 1 Dan Nicolaescu 2005-08-23 05:55:00 UTC

Created attachment 9560 [details]
Testcase for this bug

Comment 2 Andrew Pinski 2005-08-23 11:27:25 UTC

You really should know that we only care about code size at -Os.  We care about performance 
regressions though at -O2.

Comment 3 Dan Nicolaescu 2005-08-23 18:05:59 UTC

(In reply to comment #2)
> You really should know that we only care about code size at -Os.  We care
about performance 
> regressions though at -O2.

Code size is important for performance for modern processors. Small I-cache (and
I-TLB) footprint for otherwise equivalent code results in better performance.

BTW, this is a 4.1 regression.

Comment 4 Andrew Pinski 2005-08-23 18:07:57 UTC

Subject: Re:  bigger version of mov + cmp produced


On Aug 23, 2005, at 2:06 PM, dann at godzilla dot ics dot uci dot edu 
wrote:

>
> ------- Additional Comments From dann at godzilla dot ics dot uci dot 
> edu  2005-08-23 18:05 -------
> (In reply to comment #2)
>> You really should know that we only care about code size at -Os.  We 
>> care
> about performance
>> regressions though at -O2.
>
> Code size is important for performance for modern processors. Small 
> I-cache (and
> I-TLB) footprint for otherwise equivalent code results in better 
> performance.

Then use -Os every where instead.  You will see that the overall code 
size for 4.1
has reduced from 4.0.

-- Pinski

Comment 5 Dan Nicolaescu 2005-08-23 18:15:18 UTC

(In reply to comment #4)
>
> Then use -Os every where instead.  You will see that the overall code 
> size for 4.1
> has reduced from 4.0.

That might be true, but -Os is not always an option. If there's a good reason
for -O2 to generate bigger code, then so be it, but that does not seem to be the
case for the code in this PR (at least AFAICT).

Comment 6 Dan Nicolaescu 2005-09-07 22:05:45 UTC

It seems that expand generates different insns in 4.0 and 4.1 for the comparison
in question:

4.0 generates: (from .00.expand)

(insn 15 13 16 1 (set (reg/f:SI 62)
        (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
                (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32]))
-1 (nil)
    (nil))

(insn 16 15 17 1 (set (reg/f:SI 63)
        (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] <var_decl
0xb7a8b0d8 x
termWidgetClass>) [5 xtermWidgetClass+0 S4 A32])) -1 (nil)
    (nil))

(insn 17 16 18 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/f:SI 62)
            (reg/f:SI 63))) -1 (nil)
    (nil))

4.1 generates: 

(insn 15 13 16 1 (set (reg:SI 62)
        (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
                (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32]))
-1 (nil)
    (nil))

(insn 16 15 17 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 62)
            (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40]
<var_decl 0xb7b510
b0 xtermWidgetClass>) [5 xtermWidgetClass+0 S4 A32]))) -1 (nil)
    (nil))

Comment 7 Steven Bosscher 2005-10-27 16:26:33 UTC

Could the dear reported at least try to provide a small test case?

I think this should not be marked as a regression.  It's just sad that this kind of non-bug keeps the regression count high, when in reality GCC 4.1 produces smaller code overall.

Comment 8 Dan Nicolaescu 2005-10-27 16:43:37 UTC

(In reply to comment #7)
> Could the dear reported at least try to provide a small test case?

The testcase in the attachment contains only a 4 lines function: 
HandleDeIconify, the rest is just fluff to allow it to compile. Granted a lot of it can be pruned, but I don't think it stops trying to debug the problem.


> I think this should not be marked as a regression.  

Why not? It is a regression. 

> It's just sad that this
> kind of non-bug keeps the regression count high, when in reality GCC 4.1
> produces smaller code overall.

PR23153 tells a completely different story about codesize (at least for i686).

Comment 9 Steven Bosscher 2005-10-27 17:08:20 UTC

And CSiBE tells you the story that GCC 4.1 produces smaller code overall.
http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux

So do the SPEC benchmark boxes btw.

Comment 10 Steven Bosscher 2005-10-27 17:31:39 UTC

For the record, we're talking about:

   1                            .file   "t.c"
   2                            .text
   3                            .p2align 4,,15
   4                    .globl foo
   5                            .type   foo, @function
   6                    foo:
   7 0000 55                    pushl   %ebp
   8 0001 A1000000              movl    x, %eax
   8      00
   9 0006 89E5                  movl    %esp, %ebp
  10 0008 8B5508                movl    8(%ebp), %edx
  11 000b 394204                cmpl    %eax, 4(%edx)
  12 000e 7402                  je      .L4
  13 0010 5D                    popl    %ebp
  14 0011 C3                    ret
  15                            .p2align 4,,7
  16                    .L4:
  17 0012 5D                    popl    %ebp
  18 0013 E9FCFFFF              jmp     bar
  18      FF
  19                            .size   foo, .-foo
  20                            .comm   x,4,4
  21                            .section        .note.GNU-stack,"",@progbits
  22                            .ident  "GCC: (GNU) 3.3.4 (pre 3.3.5 20040809)"




vs.


   1                            .file   "t.c"
   2                            .text
   3                            .p2align 4,,15
   4                    .globl foo
   5                            .type   foo, @function
   6                    foo:
   7 0000 55                    pushl   %ebp
   8 0001 89E5                  movl    %esp, %ebp
   9 0003 8B5508                movl    8(%ebp), %edx
  10 0006 8B4204                movl    4(%edx), %eax
  11 0009 3B050000              cmpl    x, %eax
  11      0000
  12 000f 7402                  je      .L6
  13 0011 5D                    popl    %ebp
  14 0012 C3                    ret
  15                            .p2align 4,,7
  16                    .L6:
  17 0013 5D                    popl    %ebp
  18 0014 E9FCFFFF              jmp     bar
  18      FF
  19                            .size   foo, .-foo
  20                            .comm   x,4,4
  21                            .ident  "GCC: (GNU) 4.1.0 20051026 (experimental)"
  22                            .section        .note.GNU-stack,"",@progbits

Comment 11 Steven Bosscher 2005-10-27 17:34:42 UTC

And FWIW there is also a problem with this insn, the length is wrong:

#(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
#        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
#    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
#        (nil)))
A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]

Comment 12 Dan Nicolaescu 2005-10-27 18:08:22 UTC

(In reply to comment #9)
> And CSiBE tells you the story that GCC 4.1 produces smaller code overall.
> http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux

Well, it obviously depends on applications.
The point of PR23153 is to show that there is a code size regression, and all the PRs that depend on it are showing very specific issues that cause a part of the regression.

A more interesting test would be to see the Linux kernel size difference, because if there's any difference there would be some people screaming (unfortunately I won't be able to do that comparison anytime soon, hope someone else will).

Comment 13 Dan Nicolaescu 2005-10-31 04:50:34 UTC

(In reply to comment #12)
> A more interesting test would be to see the Linux kernel size difference,

There's such a comparison now in comment #8 in PR23153. It confirms the size increase.

Comment 14 Mark Mitchell 2005-10-31 05:36:20 UTC

I don't think this PR is, in-and-of-itself, is very interesting, as it's a 1-byte size increase with -O2, which, as has been said, is not aimed at minimizing code size.  So, I'm going to close this PR -- but, leave PR 23153 open, as there look to be interesting issues there.

Comment 15 Uroš Bizjak 2005-11-03 06:29:52 UTC

(In reply to comment #11)
> And FWIW there is also a problem with this insn, the length is wrong:
> 
> #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
> #        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
> #    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
> #        (nil)))
> A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]

FYI: This problem is addressed in patch at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.

Comment 16 Dan Nicolaescu 2005-11-03 06:42:52 UTC

(In reply to comment #15)
> (In reply to comment #11)
> > And FWIW there is also a problem with this insn, the length is wrong:
> > 
> > #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
> > #        (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil)
> > #    (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])
> > #        (nil)))
> > A100000000       movl    x, %eax # 11    *movsi_1/1      [length = 6]
> 
> FYI: This problem is addressed in patch at
> http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.

Do you know if your patch also fixes this PR?

Comment 17 Uroš Bizjak 2005-11-03 10:05:30 UTC

> > FYI: This problem is addressed in patch at
> > http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.
> 
> Do you know if your patch also fixes this PR?

  Unfortunatelly no... I don't know if insn size is considered during reload.

Comment 18 Jan Hubicka 2005-11-03 22:32:26 UTC

insn size is currently completely unused.  It was used for producing loop instructions.
Honza