Bug 18219 - [4.4 Regression] bloats code by 31%
Summary: [4.4 Regression] bloats code by 31%
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 minor
Target Milestone: 4.5.0
Assignee: Zdenek Dvorak
URL:
Keywords: missed-optimization
Depends on: 24669 26726
Blocks: 16996 17549 18693 39201
  Show dependency treegraph
 
Reported: 2004-10-29 14:00 UTC by Miguel Angel
Modified: 2012-03-13 12:58 UTC (History)
8 users (show)

See Also:
Host:
Target: i386-linux
Build:
Known to work: 4.5.0
Known to fail: 4.0.4
Last reconfirmed: 2007-08-06 15:18:27


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Miguel Angel 2004-10-29 14:00:15 UTC
Hi,
this little function shows an increase in codesize by 31% or 13 byte.
   
void cdt (int *limit, int *base, int minLen, int maxLen)
{
   int i;
   
   for (i = minLen + 1; i <= maxLen; i++)
      base[i] = ((limit[i-1] + 1) << 1) - base[i];
}
   
   
Size  Version Flags
  41  3.4.2   -Os
  54  4.0.0   -Os -fno-ivopts
  63  4.0.0   -Os

3.4.2 disassembly code
00000000 <cdt>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 4d 14                mov    0x14(%ebp),%ecx
   6:   56                      push   %esi
   7:   8b 55 10                mov    0x10(%ebp),%edx  
   a:   8b 75 08                mov    0x8(%ebp),%esi
   d:   53                      push   %ebx
   e:   8b 5d 0c                mov    0xc(%ebp),%ebx
  11:   42                      inc    %edx
  12:   39 ca                   cmp    %ecx,%edx  
  14:   7f 0f                   jg     25 <cdt+0x25>
  16:   8b 44 96 fc             mov    0xfffffffc(%esi,%edx,4),%eax
  1a:   40                      inc    %eax
  1b:   01 c0                   add    %eax,%eax
  1d:   2b 04 93                sub    (%ebx,%edx,4),%eax
  20:   89 04 93                mov    %eax,(%ebx,%edx,4)
  23:   eb ec                   jmp    11 <cdt+0x11>
  25:   5b                      pop    %ebx
  26:   5e                      pop    %esi
  27:   5d                      pop    %ebp
  28:   c3                      ret


4.0.0-20041028 disassembly code
00000000 <cdt>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 4d 10                mov    0x10(%ebp),%ecx
   6:   57                      push   %edi
   7:   8b 7d 08                mov    0x8(%ebp),%edi
   a:   56                      push   %esi
   b:   8b 75 14                mov    0x14(%ebp),%esi
   e:   53                      push   %ebx
   f:   8b 5d 0c                mov    0xc(%ebp),%ebx
  12:   41                      inc    %ecx
  13:   8d 14 8d 00 00 00 00    lea    0x0(,%ecx,4),%edx
  1a:   eb 11                   jmp    2d <cdt+0x2d>
  1c:   8b 44 3a fc             mov    0xfffffffc(%edx,%edi,1),%eax
  20:   41                      inc    %ecx
  21:   40                      inc    %eax
  22:   01 c0                   add    %eax,%eax
  24:   2b 04 1a                sub    (%edx,%ebx,1),%eax
  27:   89 04 1a                mov    %eax,(%edx,%ebx,1)
  2a:   83 c2 04                add    $0x4,%edx
  2d:   39 f1                   cmp    %esi,%ecx
  2f:   7e eb                   jle    1c <cdt+0x1c>
  31:   5b                      pop    %ebx
  32:   5e                      pop    %esi
  33:   5f                      pop    %edi
  34:   5d                      pop    %ebp
  35:   c3                      ret
  
This PR may be related to PR17549, but it's much smaller and
hopefully easier to analyse!
Comment 1 Andrew Pinski 2004-10-29 14:30:10 UTC
Confirmed, on PPC, IV-OPTS causes a code bloat also, 7 new instructions but note 4.0.0 with IV-OPTS 
on PPC is only one instruction more than 3.3. (on PPC every instruction is the same length).
Comment 2 Giovanni Bajo 2004-10-29 15:43:42 UTC
Zdenek, this is another PR about ivopts and -Os. The test case is very very 
simple, so if you could give a look it would be great.
Comment 3 Zdenek Dvorak 2004-11-01 13:21:24 UTC
Unfortunately I do not think this will help much in other related PRs.

Patch:

http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00025.html
Comment 4 Giovanni Bajo 2004-11-01 14:40:22 UTC
Zdenek, thanks for the patch!
What is the generated code like after your patch?
Comment 5 Zdenek Dvorak 2004-11-01 14:57:59 UTC
Subject: Re:  [4.0 Regression] gcc-4.0.0 bloats code by 31%

> Zdenek, thanks for the patch!
> What is the generated code like after your patch?

It seems that the 3.4 code is still smaller (I haven't measured it, just
guessing from looking at your disassembly), but -fno-ivopts no longer
changes it.

        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    8(%ebp), %edi
        movl    20(%ebp), %esi
        movl    16(%ebp), %ebx
        incl    %ebx
        movl    %ebx, %ecx
        leal    0(,%ebx,4), %edx
        addl    12(%ebp), %edx
        jmp     .L2
.L3:
        movl    -4(%edi,%ecx,4), %eax
        incl    %eax
        sall    %eax
        subl    (%edx), %eax
        movl    %eax, (%edx)
        incl    %ecx
        addl    $4, %edx
.L2:
        cmpl    %esi, %ecx
        jle     .L3
        popl    %ebx
        popl    %esi
        popl    %edi
        leave
        ret

Comment 6 Uroš Bizjak 2004-11-05 09:10:31 UTC
It looks that this PR is related to PR17647. Should we merge them together?

Uros.
Comment 7 Giovanni Bajo 2004-11-05 10:34:01 UTC
Subject: Re:  [4.0 Regression] gcc-4.0.0 bloats code by 31%

uros at kss-loka dot si wrote:

> It looks that this PR is related to PR17647. Should we
> merge them together?

We are still waiting for Zdenek's patch to be applied. I will reevaluate the PR
after it.

Giovanni Bajo


Comment 8 Andrew Pinski 2004-11-05 14:34:06 UTC
Hmm, for -O2 on PPC, we are not using the branch on count register instruction if we do that, -O2 will 
be smaller than -Os.
Comment 9 Andrew Pinski 2005-01-21 07:57:20 UTC
I think one of the problems is that ivopts causes out of ssa not to Coalesce two SSA_NAME:
Before out of ssa:
  D.1127_16 = *ivtmp.8_9;
  D.1128_21 = *ivtmp.12_30;
  D.1129_22 = D.1127_16 - D.1128_21;
  *ivtmp.12_30 = D.1129_22;
  ivtmp.3_17 = ivtmp.3_18 + 1;

  # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>;
  # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>;
  # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>;
<L1>:;
  ivtmp.8_7 = ivtmp.8_9 + 4B;
  ivtmp.12_31 = ivtmp.12_30 + 4B;
  D.1171_37 = ivtmp.3_18 + D.1163_6;
  i_38 = (int) D.1171_37;
  if (i_38 <= maxLen_4) goto <L0>; else goto <L2>;


After:
<L0>:;
  *ivtmp.12 = *ivtmp.17 - *ivtmp.12;
  ivtmp.3 = ivtmp.3 + 1;
  ivtmp.17 = ivtmp.8;
  ivtmp.12 = ivtmp.16;

<L1>:;
  ivtmp.8 = ivtmp.17 + 4B;
  ivtmp.16 = ivtmp.12 + 4B;
  if ((int) (ivtmp.3 + D.1163) <= maxLen) goto <L0>; else goto <L2>;

Note how there are two moves in the BB for L0.
Coalesce list: (6)ivtmp.12_30 & (7)ivtmp.12_31 [map: 6, 7] : Fail due to conflict
Coalesce list: (1)ivtmp.8_7 & (2)ivtmp.8_9 [map: 1, 2] : Fail due to conflict
Comment 10 Zdenek Dvorak 2005-01-21 08:18:49 UTC
Subject: Re:  [4.0 Regression] gcc-4.0.0 bloats code by 31%

> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-01-21 07:57 -------
> I think one of the problems is that ivopts causes out of ssa not to Coalesce two SSA_NAME:
> Before out of ssa:
>   D.1127_16 = *ivtmp.8_9;
>   D.1128_21 = *ivtmp.12_30;
>   D.1129_22 = D.1127_16 - D.1128_21;
>   *ivtmp.12_30 = D.1129_22;
>   ivtmp.3_17 = ivtmp.3_18 + 1;
> 
>   # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>;
>   # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>;
>   # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>;
> <L1>:;
>   ivtmp.8_7 = ivtmp.8_9 + 4B;
>   ivtmp.12_31 = ivtmp.12_30 + 4B;
>   D.1171_37 = ivtmp.3_18 + D.1163_6;
>   i_38 = (int) D.1171_37;
>   if (i_38 <= maxLen_4) goto <L0>; else goto <L2>;
> 
> 
> After:
> <L0>:;
>   *ivtmp.12 = *ivtmp.17 - *ivtmp.12;
>   ivtmp.3 = ivtmp.3 + 1;
>   ivtmp.17 = ivtmp.8;
>   ivtmp.12 = ivtmp.16;
> 
> <L1>:;
>   ivtmp.8 = ivtmp.17 + 4B;
>   ivtmp.16 = ivtmp.12 + 4B;
>   if ((int) (ivtmp.3 + D.1163) <= maxLen) goto <L0>; else goto <L2>;
> 
> Note how there are two moves in the BB for L0.
> Coalesce list: (6)ivtmp.12_30 & (7)ivtmp.12_31 [map: 6, 7] : Fail due to conflict
> Coalesce list: (1)ivtmp.8_7 & (2)ivtmp.8_9 [map: 1, 2] : Fail due to conflict

I am fairly sure that ivopts themselves create both ivtmp.12 and ivtmp.8
such that life ranges of their ssa names do not overlap.  However some
of the later passes (most probably dom) propagates ivtmp.8_9 to
expressions after definition of ivtmp.8_7.

It might help to add pass that would transform this code to the
following, thus enabling the coalescing of ivs.  I will give it a try.

  D.1127_16 = *ivtmp.8.9;
  D.1128_21 = *ivtmp.12.30;
  D.1129_22 = D.1127_16 - D.1128_21;
  *ivtmp.12.30 = D.1129_22;
  ivtmp.3_17 = ivtmp.3_18 + 1;

  # ivtmp.12_30 = PHI <ivtmp.12_35(0), ivtmp.12_31(1)>;
  # ivtmp.8_9 = PHI <ivtmp.8_29(0), ivtmp.8_7(1)>;
  # ivtmp.3_18 = PHI <0(0), ivtmp.3_17(1)>;
<L1>:;
  ivtmp.8.9 = ivtmp.8_9;
  ivtmp.8_7 = ivtmp.8_9 + 4B;
  ivtmp.12.30 = ivtmp.12_30;
  ivtmp.12_31 = ivtmp.12_30 + 4B;
  D.1171_37 = ivtmp.3_18 + D.1163_6;
  i_38 = (int) D.1171_37;
  if (i_38 <= maxLen_4) goto <L0>; else goto <L2>;
Comment 11 Steven Bosscher 2005-01-21 14:04:50 UTC
This actually looks like a duplicate of PR19038 now. 
Comment 12 Andrew Pinski 2005-01-21 21:05:33 UTC
(In reply to comment #11)
> This actually looks like a duplicate of PR19038 now. 
Actually it is not, the problem is related not doing loop copy header for -Os.
Comment 13 CVS Commits 2005-02-06 18:47:21 UTC
Subject: Bug 18219

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rakdver@gcc.gnu.org	2005-02-06 18:47:14

Modified files:
	gcc            : ChangeLog tree-ssa-loop-ivopts.c 

Log message:
	PR tree-optimization/18219
	* tree-ssa-loop-ivopts.c (get_computation_at): Produce computations
	in distributed form.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7394&r2=2.7395
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.42&r2=2.43

Comment 14 Andrew Pinski 2005-02-10 21:09:57 UTC
On PPC, at least (now), the code size increases come from having more than one IV.
Comment 15 Steven Bosscher 2005-02-10 23:57:43 UTC
flags:                  .text size: 
-Os                     86 bytes 
-Os -fno-ivopts         86 bytes 
-m32 -Os                62 bytes 
-m32 -Os -fno-ivopts    54 bytes 
 
Comment 16 Mark Mitchell 2005-10-31 01:36:23 UTC
Leaving as P2; this seems like something we should try to fix, if possible.
Comment 17 Steven Bosscher 2006-01-09 22:43:07 UTC
And the numbers for "gcc (GCC) 4.1.0 20060109" are....

flags:                  .text size: 
-Os                     86 bytes
-Os -fno-ivopts         86 bytes
-m32 -Os                58 bytes
-m32 -Os -fno-ivopts    59 bytes
Comment 18 Steven Bosscher 2006-01-09 22:44:33 UTC
For reference, gcc 3.3-hammer-branch has the following .text sizes:

flags:                  .text size: 
-Os                     83
-m32 -Os                44
Comment 19 Steven Bosscher 2006-01-09 22:50:37 UTC
Disassembly of section .text for x86 (compiled on AMD64 with -m32 -mtune=i686):

00000000 <cdt>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 4d 10                mov    0x10(%ebp),%ecx
   6:   56                      push   %esi
   7:   8b 75 14                mov    0x14(%ebp),%esi
   a:   53                      push   %ebx
   b:   8b 5d 08                mov    0x8(%ebp),%ebx
   e:   8d 04 8d 00 00 00 00    lea    0x0(,%ecx,4),%eax
  15:   01 c3                   add    %eax,%ebx
  17:   03 45 0c                add    0xc(%ebp),%eax
  1a:   8d 50 04                lea    0x4(%eax),%edx
  1d:   eb 0c                   jmp    2b <cdt+0x2b>
  1f:   8b 43 fc                mov    0xfffffffc(%ebx),%eax
  22:   40                      inc    %eax
  23:   01 c0                   add    %eax,%eax
  25:   2b 42 fc                sub    0xfffffffc(%edx),%eax
  28:   89 42 fc                mov    %eax,0xfffffffc(%edx)
  2b:   41                      inc    %ecx
  2c:   83 c3 04                add    $0x4,%ebx
  2f:   83 c2 04                add    $0x4,%edx
  32:   39 f1                   cmp    %esi,%ecx
  34:   7e e9                   jle    1f <cdt+0x1f>
  36:   5b                      pop    %ebx
  37:   5e                      pop    %esi
  38:   5d                      pop    %ebp
  39:   c3                      ret
Comment 20 Mark Mitchell 2006-02-24 00:25:22 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 21 Richard Biener 2006-05-12 14:00:09 UTC
This looks related to PR26726 as IVOPTs produces now

<bb 2>:
  i = minLen + 1;
  D.1588 = (int *) (unsigned int) (i * 4);
  ivtmp.34 = limit + D.1588 - 4B;
  ivtmp.40 = base + D.1588;
  goto <bb 4> (<L1>);

<L0>:;
  D.1595 = (int *) ivtmp.40;
  MEM[base: D.1595, offset: -4B] = (MEM[base: (int *) ivtmp.34, offset: -4B] + 1 << 1) - MEM[base: D.1595, offset: -4B];
  i = i + 1;

<L1>:;
  ivtmp.34 = ivtmp.34 + 4B;
  ivtmp.40 = ivtmp.40 + 4B;
  if (i <= maxLen) goto <L0>; else goto <L2>;

with the seemingly innocuous offset: -4B canonicalization because of the weird
i386 backend cost model.  Now if fixing that would fix the size issue is another thing.  Not replacing the exit test or using i as solely IV is another thing - but it doesn't even consider i as IV candidate.
Comment 22 Mark Mitchell 2006-05-25 02:32:10 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 23 Andrew Pinski 2006-08-28 05:11:51 UTC
New results from the mainline:
options            size
-Os                 59
-Os -fno-ivopts     52
-Os -ftree-ch       58
-O2                 64
Comment 24 Andrew Pinski 2007-06-18 04:39:14 UTC
  MEM[index: ivtmp.39, offset: 0x0fffffffc] = (MEM[index: ivtmp.35, offset: 0x0fffffffc] + 1 << 1) - MEM[index: ivtmp.39, offset: 0x0fffffffc];


We still get an offset of -4.
Comment 25 Uroš Bizjak 2007-06-18 10:17:31 UTC
(In reply to comment #24)
>   MEM[index: ivtmp.39, offset: 0x0fffffffc] = (MEM[index: ivtmp.35, offset:
> 0x0fffffffc] + 1 << 1) - MEM[index: ivtmp.39, offset: 0x0fffffffc];
> 
> 
> We still get an offset of -4.

PR target/24669


Comment 26 Uroš Bizjak 2007-10-29 08:29:00 UTC
We regress a bit more with
gcc version 4.3.0 20071029 (experimental) [trunk revision 129715] (GCC) 

options          size
 -Os              60
 -Os -fno-ivopts  55
 -Os -ftree-ch    55
 -O2              59
Comment 27 Joseph S. Myers 2008-07-04 16:34:46 UTC
Closing 4.1 branch.
Comment 28 Joseph S. Myers 2009-03-31 16:21:43 UTC
Closing 4.2 branch.
Comment 29 Richard Biener 2009-08-04 12:26:06 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 30 Steven Bosscher 2010-01-03 22:57:39 UTC
With the following compiler:
$ ./xgcc --version
xgcc (GCC) 4.5.0 20091228 (experimental) [trunk revision 155486]
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


I get the following size with -m32 for the test case of comment #0:
$ ./xgcc -B. -c t.c -m32 -Os -fno-ivopts -o tOsi.o
$ ./xgcc -B. -c t.c -m32 -Os -o tOs.o
$ size tOs*
   text	   data	    bss	    dec	    hex	filename
     45	      0	      0	     45	     2d	tOsi.o
     42	      0	      0	     42	     2a	tOs.o
$ 


And I get the following size with -m64:
$ ./xgcc -B. -c t.c -Os -o tOs.o
$ ./xgcc -B. -c t.c -Os -fno-ivopts -o tOsi.o
$ size tOs*
   text	   data	    bss	    dec	    hex	filename
     78	      0	      0	     78	     4e	tOsi.o
     89	      0	      0	     89	     59	tOs.o
$ 


This is, therefore, fixed on the trunk for GCC 4.5.
Comment 31 Richard Biener 2010-01-03 23:14:41 UTC
Heh.  I wonder how we can reliably add testcases for code size issues ...
can we either dump size estimates or extract object .text section sizes?
Ian may know arm people who I suspect would be interested in contributing
infrastructure for this.
Comment 32 Richard Biener 2010-05-22 18:10:05 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 33 Richard Biener 2011-06-27 12:12:39 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 34 Jakub Jelinek 2012-03-13 12:58:35 UTC
Fixed in 4.5+, 4.4 is no longer supported.