Hi, this little function shows an increase in codesize by 31% or 13 byte. void cdt (int *limit, int *base, int minLen, int maxLen) { int i; for (i = minLen + 1; i <= maxLen; i++) base[i] = ((limit[i-1] + 1) << 1) - base[i]; } Size Version Flags 41 3.4.2 -Os 54 4.0.0 -Os -fno-ivopts 63 4.0.0 -Os 3.4.2 disassembly code 00000000 <cdt>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 4d 14 mov 0x14(%ebp),%ecx 6: 56 push %esi 7: 8b 55 10 mov 0x10(%ebp),%edx a: 8b 75 08 mov 0x8(%ebp),%esi d: 53 push %ebx e: 8b 5d 0c mov 0xc(%ebp),%ebx 11: 42 inc %edx 12: 39 ca cmp %ecx,%edx 14: 7f 0f jg 25 <cdt+0x25> 16: 8b 44 96 fc mov 0xfffffffc(%esi,%edx,4),%eax 1a: 40 inc %eax 1b: 01 c0 add %eax,%eax 1d: 2b 04 93 sub (%ebx,%edx,4),%eax 20: 89 04 93 mov %eax,(%ebx,%edx,4) 23: eb ec jmp 11 <cdt+0x11> 25: 5b pop %ebx 26: 5e pop %esi 27: 5d pop %ebp 28: c3 ret 4.0.0-20041028 disassembly code 00000000 <cdt>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 4d 10 mov 0x10(%ebp),%ecx 6: 57 push %edi 7: 8b 7d 08 mov 0x8(%ebp),%edi a: 56 push %esi b: 8b 75 14 mov 0x14(%ebp),%esi e: 53 push %ebx f: 8b 5d 0c mov 0xc(%ebp),%ebx 12: 41 inc %ecx 13: 8d 14 8d 00 00 00 00 lea 0x0(,%ecx,4),%edx 1a: eb 11 jmp 2d <cdt+0x2d> 1c: 8b 44 3a fc mov 0xfffffffc(%edx,%edi,1),%eax 20: 41 inc %ecx 21: 40 inc %eax 22: 01 c0 add %eax,%eax 24: 2b 04 1a sub (%edx,%ebx,1),%eax 27: 89 04 1a mov %eax,(%edx,%ebx,1) 2a: 83 c2 04 add $0x4,%edx 2d: 39 f1 cmp %esi,%ecx 2f: 7e eb jle 1c <cdt+0x1c> 31: 5b pop %ebx 32: 5e pop %esi 33: 5f pop %edi 34: 5d pop %ebp 35: c3 ret This PR may be related to PR17549, but it's much smaller and hopefully easier to analyse!
Confirmed, on PPC, IV-OPTS causes a code bloat also, 7 new instructions but note 4.0.0 with IV-OPTS on PPC is only one instruction more than 3.3. (on PPC every instruction is the same length).
Zdenek, this is another PR about ivopts and -Os. The test case is very very simple, so if you could give a look it would be great.
Unfortunately I do not think this will help much in other related PRs. Patch: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00025.html
Zdenek, thanks for the patch! What is the generated code like after your patch?
Subject: Re: [4.0 Regression] gcc-4.0.0 bloats code by 31% > Zdenek, thanks for the patch! > What is the generated code like after your patch? It seems that the 3.4 code is still smaller (I haven't measured it, just guessing from looking at your disassembly), but -fno-ivopts no longer changes it. pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx movl 8(%ebp), %edi movl 20(%ebp), %esi movl 16(%ebp), %ebx incl %ebx movl %ebx, %ecx leal 0(,%ebx,4), %edx addl 12(%ebp), %edx jmp .L2 .L3: movl -4(%edi,%ecx,4), %eax incl %eax sall %eax subl (%edx), %eax movl %eax, (%edx) incl %ecx addl $4, %edx .L2: cmpl %esi, %ecx jle .L3 popl %ebx popl %esi popl %edi leave ret
It looks that this PR is related to PR17647. Should we merge them together? Uros.
Subject: Re: [4.0 Regression] gcc-4.0.0 bloats code by 31% uros at kss-loka dot si wrote: > It looks that this PR is related to PR17647. Should we > merge them together? We are still waiting for Zdenek's patch to be applied. I will reevaluate the PR after it. Giovanni Bajo
Hmm, for -O2 on PPC, we are not using the branch on count register instruction if we do that, -O2 will be smaller than -Os.
I think one of the problems is that ivopts causes out of ssa not to Coalesce two SSA_NAME: Before out of ssa: D.1127_16 = *ivtmp.8_9; D.1128_21 = *ivtmp.12_30; D.1129_22 = D.1127_16 - D.1128_21; *ivtmp.12_30 = D.1129_22; ivtmp.3_17 = ivtmp.3_18 + 1; # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>; # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>; # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>; <L1>:; ivtmp.8_7 = ivtmp.8_9 + 4B; ivtmp.12_31 = ivtmp.12_30 + 4B; D.1171_37 = ivtmp.3_18 + D.1163_6; i_38 = (int) D.1171_37; if (i_38 <= maxLen_4) goto <L0>; else goto <L2>; After: <L0>:; *ivtmp.12 = *ivtmp.17 - *ivtmp.12; ivtmp.3 = ivtmp.3 + 1; ivtmp.17 = ivtmp.8; ivtmp.12 = ivtmp.16; <L1>:; ivtmp.8 = ivtmp.17 + 4B; ivtmp.16 = ivtmp.12 + 4B; if ((int) (ivtmp.3 + D.1163) <= maxLen) goto <L0>; else goto <L2>; Note how there are two moves in the BB for L0. Coalesce list: (6)ivtmp.12_30 & (7)ivtmp.12_31 [map: 6, 7] : Fail due to conflict Coalesce list: (1)ivtmp.8_7 & (2)ivtmp.8_9 [map: 1, 2] : Fail due to conflict
Subject: Re: [4.0 Regression] gcc-4.0.0 bloats code by 31% > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-21 07:57 ------- > I think one of the problems is that ivopts causes out of ssa not to Coalesce two SSA_NAME: > Before out of ssa: > D.1127_16 = *ivtmp.8_9; > D.1128_21 = *ivtmp.12_30; > D.1129_22 = D.1127_16 - D.1128_21; > *ivtmp.12_30 = D.1129_22; > ivtmp.3_17 = ivtmp.3_18 + 1; > > # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>; > # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>; > # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>; > <L1>:; > ivtmp.8_7 = ivtmp.8_9 + 4B; > ivtmp.12_31 = ivtmp.12_30 + 4B; > D.1171_37 = ivtmp.3_18 + D.1163_6; > i_38 = (int) D.1171_37; > if (i_38 <= maxLen_4) goto <L0>; else goto <L2>; > > > After: > <L0>:; > *ivtmp.12 = *ivtmp.17 - *ivtmp.12; > ivtmp.3 = ivtmp.3 + 1; > ivtmp.17 = ivtmp.8; > ivtmp.12 = ivtmp.16; > > <L1>:; > ivtmp.8 = ivtmp.17 + 4B; > ivtmp.16 = ivtmp.12 + 4B; > if ((int) (ivtmp.3 + D.1163) <= maxLen) goto <L0>; else goto <L2>; > > Note how there are two moves in the BB for L0. > Coalesce list: (6)ivtmp.12_30 & (7)ivtmp.12_31 [map: 6, 7] : Fail due to conflict > Coalesce list: (1)ivtmp.8_7 & (2)ivtmp.8_9 [map: 1, 2] : Fail due to conflict I am fairly sure that ivopts themselves create both ivtmp.12 and ivtmp.8 such that life ranges of their ssa names do not overlap. However some of the later passes (most probably dom) propagates ivtmp.8_9 to expressions after definition of ivtmp.8_7. It might help to add pass that would transform this code to the following, thus enabling the coalescing of ivs. I will give it a try. D.1127_16 = *ivtmp.8.9; D.1128_21 = *ivtmp.12.30; D.1129_22 = D.1127_16 - D.1128_21; *ivtmp.12.30 = D.1129_22; ivtmp.3_17 = ivtmp.3_18 + 1; # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>; # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>; # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>; <L1>:; ivtmp.8.9 = ivtmp.8_9; ivtmp.8_7 = ivtmp.8_9 + 4B; ivtmp.12.30 = ivtmp.12_30; ivtmp.12_31 = ivtmp.12_30 + 4B; D.1171_37 = ivtmp.3_18 + D.1163_6; i_38 = (int) D.1171_37; if (i_38 <= maxLen_4) goto <L0>; else goto <L2>;
This actually looks like a duplicate of PR19038 now.
(In reply to comment #11) > This actually looks like a duplicate of PR19038 now. Actually it is not, the problem is related not doing loop copy header for -Os.
Subject: Bug 18219 CVSROOT: /cvs/gcc Module name: gcc Changes by: rakdver@gcc.gnu.org 2005-02-06 18:47:14 Modified files: gcc : ChangeLog tree-ssa-loop-ivopts.c Log message: PR tree-optimization/18219 * tree-ssa-loop-ivopts.c (get_computation_at): Produce computations in distributed form. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7394&r2=2.7395 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.42&r2=2.43
On PPC, at least (now), the code size increases come from having more than one IV.
flags: .text size: -Os 86 bytes -Os -fno-ivopts 86 bytes -m32 -Os 62 bytes -m32 -Os -fno-ivopts 54 bytes
Leaving as P2; this seems like something we should try to fix, if possible.
And the numbers for "gcc (GCC) 4.1.0 20060109" are.... flags: .text size: -Os 86 bytes -Os -fno-ivopts 86 bytes -m32 -Os 58 bytes -m32 -Os -fno-ivopts 59 bytes
For reference, gcc 3.3-hammer-branch has the following .text sizes: flags: .text size: -Os 83 -m32 -Os 44
Disassembly of section .text for x86 (compiled on AMD64 with -m32 -mtune=i686): 00000000 <cdt>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 4d 10 mov 0x10(%ebp),%ecx 6: 56 push %esi 7: 8b 75 14 mov 0x14(%ebp),%esi a: 53 push %ebx b: 8b 5d 08 mov 0x8(%ebp),%ebx e: 8d 04 8d 00 00 00 00 lea 0x0(,%ecx,4),%eax 15: 01 c3 add %eax,%ebx 17: 03 45 0c add 0xc(%ebp),%eax 1a: 8d 50 04 lea 0x4(%eax),%edx 1d: eb 0c jmp 2b <cdt+0x2b> 1f: 8b 43 fc mov 0xfffffffc(%ebx),%eax 22: 40 inc %eax 23: 01 c0 add %eax,%eax 25: 2b 42 fc sub 0xfffffffc(%edx),%eax 28: 89 42 fc mov %eax,0xfffffffc(%edx) 2b: 41 inc %ecx 2c: 83 c3 04 add $0x4,%ebx 2f: 83 c2 04 add $0x4,%edx 32: 39 f1 cmp %esi,%ecx 34: 7e e9 jle 1f <cdt+0x1f> 36: 5b pop %ebx 37: 5e pop %esi 38: 5d pop %ebp 39: c3 ret
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
This looks related to PR26726 as IVOPTs produces now <bb 2>: i = minLen + 1; D.1588 = (int *) (unsigned int) (i * 4); ivtmp.34 = limit + D.1588 - 4B; ivtmp.40 = base + D.1588; goto <bb 4> (<L1>); <L0>:; D.1595 = (int *) ivtmp.40; MEM[base: D.1595, offset: -4B] = (MEM[base: (int *) ivtmp.34, offset: -4B] + 1 << 1) - MEM[base: D.1595, offset: -4B]; i = i + 1; <L1>:; ivtmp.34 = ivtmp.34 + 4B; ivtmp.40 = ivtmp.40 + 4B; if (i <= maxLen) goto <L0>; else goto <L2>; with the seemingly innocuous offset: -4B canonicalization because of the weird i386 backend cost model. Now if fixing that would fix the size issue is another thing. Not replacing the exit test or using i as solely IV is another thing - but it doesn't even consider i as IV candidate.
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
New results from the mainline: options size -Os 59 -Os -fno-ivopts 52 -Os -ftree-ch 58 -O2 64
MEM[index: ivtmp.39, offset: 0x0fffffffc] = (MEM[index: ivtmp.35, offset: 0x0fffffffc] + 1 << 1) - MEM[index: ivtmp.39, offset: 0x0fffffffc]; We still get an offset of -4.
(In reply to comment #24) > MEM[index: ivtmp.39, offset: 0x0fffffffc] = (MEM[index: ivtmp.35, offset: > 0x0fffffffc] + 1 << 1) - MEM[index: ivtmp.39, offset: 0x0fffffffc]; > > > We still get an offset of -4. PR target/24669
We regress a bit more with gcc version 4.3.0 20071029 (experimental) [trunk revision 129715] (GCC) options size -Os 60 -Os -fno-ivopts 55 -Os -ftree-ch 55 -O2 59
Closing 4.1 branch.
Closing 4.2 branch.
GCC 4.3.4 is being released, adjusting target milestone.
With the following compiler: $ ./xgcc --version xgcc (GCC) 4.5.0 20091228 (experimental) [trunk revision 155486] Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. I get the following size with -m32 for the test case of comment #0: $ ./xgcc -B. -c t.c -m32 -Os -fno-ivopts -o tOsi.o $ ./xgcc -B. -c t.c -m32 -Os -o tOs.o $ size tOs* text data bss dec hex filename 45 0 0 45 2d tOsi.o 42 0 0 42 2a tOs.o $ And I get the following size with -m64: $ ./xgcc -B. -c t.c -Os -o tOs.o $ ./xgcc -B. -c t.c -Os -fno-ivopts -o tOsi.o $ size tOs* text data bss dec hex filename 78 0 0 78 4e tOsi.o 89 0 0 89 59 tOs.o $ This is, therefore, fixed on the trunk for GCC 4.5.
Heh. I wonder how we can reliably add testcases for code size issues ... can we either dump size estimates or extract object .text section sizes? Ian may know arm people who I suspect would be interested in contributing infrastructure for this.
GCC 4.3.5 is being released, adjusting target milestone.
4.3 branch is being closed, moving to 4.4.7 target.
Fixed in 4.5+, 4.4 is no longer supported.