Compiling the code in the attachment using -O2 -march=i686 4.1 produces bigger code than 4.0. 4.0 produces: movl xtermWidgetClass, %eax cmpl %eax, 4(%edx) 4.1 produces: movl 4(%edx), %eax cmpl xtermWidgetClass, %eax the 4.1 version is 1 byte bigger. This is one of the reasons for the code size increase in PR23153.
Created attachment 9560 [details] Testcase for this bug
You really should know that we only care about code size at -Os. We care about performance regressions though at -O2.
(In reply to comment #2) > You really should know that we only care about code size at -Os. We care about performance > regressions though at -O2. Code size is important for performance for modern processors. Small I-cache (and I-TLB) footprint for otherwise equivalent code results in better performance. BTW, this is a 4.1 regression.
Subject: Re: bigger version of mov + cmp produced On Aug 23, 2005, at 2:06 PM, dann at godzilla dot ics dot uci dot edu wrote: > > ------- Additional Comments From dann at godzilla dot ics dot uci dot > edu 2005-08-23 18:05 ------- > (In reply to comment #2) >> You really should know that we only care about code size at -Os. We >> care > about performance >> regressions though at -O2. > > Code size is important for performance for modern processors. Small > I-cache (and > I-TLB) footprint for otherwise equivalent code results in better > performance. Then use -Os every where instead. You will see that the overall code size for 4.1 has reduced from 4.0. -- Pinski
(In reply to comment #4) > > Then use -Os every where instead. You will see that the overall code > size for 4.1 > has reduced from 4.0. That might be true, but -Os is not always an option. If there's a good reason for -O2 to generate bigger code, then so be it, but that does not seem to be the case for the code in this PR (at least AFAICT).
It seems that expand generates different insns in 4.0 and 4.1 for the comparison in question: 4.0 generates: (from .00.expand) (insn 15 13 16 1 (set (reg/f:SI 62) (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ]) (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32])) -1 (nil) (nil)) (insn 16 15 17 1 (set (reg/f:SI 63) (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] <var_decl 0xb7a8b0d8 x termWidgetClass>) [5 xtermWidgetClass+0 S4 A32])) -1 (nil) (nil)) (insn 17 16 18 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg/f:SI 62) (reg/f:SI 63))) -1 (nil) (nil)) 4.1 generates: (insn 15 13 16 1 (set (reg:SI 62) (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ]) (const_int 4 [0x4])) [5 <variable>.core.widget_class+0 S4 A32])) -1 (nil) (nil)) (insn 16 15 17 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 62) (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] <var_decl 0xb7b510 b0 xtermWidgetClass>) [5 xtermWidgetClass+0 S4 A32]))) -1 (nil) (nil))
Could the dear reported at least try to provide a small test case? I think this should not be marked as a regression. It's just sad that this kind of non-bug keeps the regression count high, when in reality GCC 4.1 produces smaller code overall.
(In reply to comment #7) > Could the dear reported at least try to provide a small test case? The testcase in the attachment contains only a 4 lines function: HandleDeIconify, the rest is just fluff to allow it to compile. Granted a lot of it can be pruned, but I don't think it stops trying to debug the problem. > I think this should not be marked as a regression. Why not? It is a regression. > It's just sad that this > kind of non-bug keeps the regression count high, when in reality GCC 4.1 > produces smaller code overall. PR23153 tells a completely different story about codesize (at least for i686).
And CSiBE tells you the story that GCC 4.1 produces smaller code overall. http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux So do the SPEC benchmark boxes btw.
For the record, we're talking about: 1 .file "t.c" 2 .text 3 .p2align 4,,15 4 .globl foo 5 .type foo, @function 6 foo: 7 0000 55 pushl %ebp 8 0001 A1000000 movl x, %eax 8 00 9 0006 89E5 movl %esp, %ebp 10 0008 8B5508 movl 8(%ebp), %edx 11 000b 394204 cmpl %eax, 4(%edx) 12 000e 7402 je .L4 13 0010 5D popl %ebp 14 0011 C3 ret 15 .p2align 4,,7 16 .L4: 17 0012 5D popl %ebp 18 0013 E9FCFFFF jmp bar 18 FF 19 .size foo, .-foo 20 .comm x,4,4 21 .section .note.GNU-stack,"",@progbits 22 .ident "GCC: (GNU) 3.3.4 (pre 3.3.5 20040809)" vs. 1 .file "t.c" 2 .text 3 .p2align 4,,15 4 .globl foo 5 .type foo, @function 6 foo: 7 0000 55 pushl %ebp 8 0001 89E5 movl %esp, %ebp 9 0003 8B5508 movl 8(%ebp), %edx 10 0006 8B4204 movl 4(%edx), %eax 11 0009 3B050000 cmpl x, %eax 11 0000 12 000f 7402 je .L6 13 0011 5D popl %ebp 14 0012 C3 ret 15 .p2align 4,,7 16 .L6: 17 0013 5D popl %ebp 18 0014 E9FCFFFF jmp bar 18 FF 19 .size foo, .-foo 20 .comm x,4,4 21 .ident "GCC: (GNU) 4.1.0 20051026 (experimental)" 22 .section .note.GNU-stack,"",@progbits
And FWIW there is also a problem with this insn, the length is wrong: #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61]) # (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil) # (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32]) # (nil))) A100000000 movl x, %eax # 11 *movsi_1/1 [length = 6]
(In reply to comment #9) > And CSiBE tells you the story that GCC 4.1 produces smaller code overall. > http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux Well, it obviously depends on applications. The point of PR23153 is to show that there is a code size regression, and all the PRs that depend on it are showing very specific issues that cause a part of the regression. A more interesting test would be to see the Linux kernel size difference, because if there's any difference there would be some people screaming (unfortunately I won't be able to do that comparison anytime soon, hope someone else will).
(In reply to comment #12) > A more interesting test would be to see the Linux kernel size difference, There's such a comparison now in comment #8 in PR23153. It confirms the size increase.
I don't think this PR is, in-and-of-itself, is very interesting, as it's a 1-byte size increase with -O2, which, as has been said, is not aimed at minimizing code size. So, I'm going to close this PR -- but, leave PR 23153 open, as there look to be interesting issues there.
(In reply to comment #11) > And FWIW there is also a problem with this insn, the length is wrong: > > #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61]) > # (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil) > # (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32]) > # (nil))) > A100000000 movl x, %eax # 11 *movsi_1/1 [length = 6] FYI: This problem is addressed in patch at http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.
(In reply to comment #15) > (In reply to comment #11) > > And FWIW there is also a problem with this insn, the length is wrong: > > > > #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61]) > > # (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} (nil) > > # (expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32]) > > # (nil))) > > A100000000 movl x, %eax # 11 *movsi_1/1 [length = 6] > > FYI: This problem is addressed in patch at > http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html. Do you know if your patch also fixes this PR?
> > FYI: This problem is addressed in patch at > > http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html. > > Do you know if your patch also fixes this PR? Unfortunatelly no... I don't know if insn size is considered during reload.
insn size is currently completely unused. It was used for producing loop instructions. Honza