Bug 39077 - [4.3/4.4/4.5/4.6 Regression] GCSE-optimization causes enormous binary size increase (~20 times !)
Summary: [4.3/4.4/4.5/4.6 Regression] GCSE-optimization causes enormous binary size in...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.2
: P2 normal
Target Milestone: 4.6.0
Assignee: Jeffrey A. Law
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: missed-optimization, patch
Depends on:
Blocks:
 
Reported: 2009-02-02 16:18 UTC by comer352l
Modified: 2011-11-25 13:08 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.2.4
Known to fail: 4.3.3, 4.4.0
Last reconfirmed: 2009-03-08 21:38:43


Attachments
Preprocessed file (created with gcc4.3.2) (120.26 KB, application/octet-stream)
2009-02-02 16:22 UTC, comer352l
Details
unincluded testcase (46.74 KB, text/plain)
2009-02-03 09:55 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description comer352l 2009-02-02 16:18:24 UTC
The example attached is a simple data-container-class. The data is stored as Qt4-QStringLists.
Everything was fine with gcc 4.1 (330KB). 4.2 already caused a significant increase. With 4.3, the binary size increased enormously again (6.7MB !).

When I disable GCSE-optimization using the additional flag "-fnogcse", I get normal binary sizes (360KB with 4.3.2).


Command line use for compilation:
g++ -c -pipe -O2 -march=i586 -mtune=i686 -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -W -D_REENTRANT -DQT_NO_DEBUG -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED -I/usr/share/qt4/mkspecs/default -I. -I/usr/include/QtCore -I/usr/include/QtCore -I/usr/include/QtGui -I/usr/include/QtGui -I/usr/include -I. -Isrc -Isrc/linux -I. -I. -o SSMprotocol_def_en.o src/SSMprotocol_def_en.cpp

=> no errors or warnings


GCC-info (Output of 'g++ -v'):

Target: i586-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=i586-suse-linux
Thread model: posix
gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)
Comment 1 comer352l 2009-02-02 16:22:47 UTC
Created attachment 17229 [details]
Preprocessed file (created with gcc4.3.2)
Comment 2 Richard Biener 2009-02-03 09:54:50 UTC
Confirmed.  Likely triggered by inliner decision changes.  -Os is fine.  Steven,
do we need to constrain GCSE similar to your PRE patches?  Is this PPRE in GCSE
at work?

Object size with G++ 4.4 is 7MB.
Comment 3 Richard Biener 2009-02-03 09:55:18 UTC
Created attachment 17230 [details]
unincluded testcase
Comment 4 Steven Bosscher 2009-02-05 14:23:43 UTC
Investigating...
Comment 5 Steven Bosscher 2009-02-06 11:59:13 UTC
I am unable to reproduce this on Cygwin.  Anyone got a .gcse dump for me?
Comment 6 Richard Biener 2009-02-06 13:07:49 UTC
It's at 280MB and still growing.  Let's hope it compresses well ;)
Comment 7 Richard Biener 2009-02-06 14:35:28 UTC
Ok, still too large to attach.  It should appear at
http://gcc.opensuse.org/SSMprotocol_def_en.cpp.140r.gcse1.lzma
after some time.
Comment 8 Steven Bosscher 2009-02-06 17:50:19 UTC
Re. comment #2:
This looks more like normal PRE over exception edges, which AFAIK tree-ssa-pre does not do (it keeps ANTIC_IN empty for any block that has abnormal predecessors).
Comment 9 Steven Bosscher 2009-02-21 15:25:46 UTC
It looks like this is some kind of quadratic insertion problem, maybe PPRE after all.  I hacked GCSE a bit to see what is going on.

I fist counted the number basic blocks and edges per function, the number of expressions considered by PRE, and the block with the largest number of predecessor edges.  Then, I counted for each partially redundant expression how many inserts are necessary to make the expression fully redundant. Here are the numbers of the two largest functions:

counting (nbb = 9189 ; nedges = 15477 ; nexpr = 2417 ; max(preds) = 583)
1164, 1162, 1160, 1158, 1156, 1154, 1152, 1150, 1148, 1146, 

counting (nbb = 5834 ; nedges = 9727 ; nexpr = 1936 ; max(preds) = 489)
976, 974, 972, 970, 968, 966, 964, 962, 960, 958, 

See how there is a series there?

1164 = 2*583 = 2*max(preds) for the first function, and from thereon, we have a series of n(i+1) = n(i) - 2.  This goes on for a long time (I've only included the top 10 partially redundant expressions in the above numbers).

Likewise for the second function: 976 = 2*489 etc.

This *is* the same behavior as what we see with PPRE in tree-ssa-pre.c.  But I don't understand enough of what is going on to be sure that this is PPRE in gcse.c.
Comment 10 Steven Bosscher 2009-02-21 16:09:09 UTC
OK, I checked what we're PREing here.  This is indeed partial-partial PRE.

I suppose something like the following is a good idea.  I'll admit it's brute-force, but I'm not sure how else to stop GCSE-PRE from doing this (it's baked into the LCM equations).

Jeff, what do you think about this PR?

Index: gcse.c
===================================================================
--- gcse.c	(revision 144303)
+++ gcse.c	(working copy)
@@ -3801,16 +3801,30 @@
       edge e;
       edge_iterator ei;
 
-      /* If the current block is the destination of an abnormal edge, we
-	 kill all trapping expressions because we won't be able to properly
-	 place the instruction on the edge.  So make them neither
-	 anticipatable nor transparent.  This is fairly conservative.  */
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (e->flags & EDGE_ABNORMAL)
-	  {
-	    sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
-	    sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
-	    break;
+      if (EDGE_COUNT (bb->preds) > 20) /* ??? Should be a PARAM */
+	{
+	  /* If a block has a large number of incoming edges, then inserting
+	     many expressions in the predecessors to make one/few expression
+	     fully redundant is probably not a profitable transformation.
+	     Make all expressions non-anticipatable and non-transparent.  */
+	  sbitmap_zero (antloc[bb->index]);
+	  sbitmap_zero (transp[bb->index]);
+	}
+      else
+	{
+	  /* If the current block is the destination of an abnormal edge, we
+	     kill all trapping expressions because we won't be able to properly
+	     place the instruction on the edge.  So make them neither
+	     anticipatable nor transparent.  This is fairly conservative.  */
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    if (e->flags & EDGE_ABNORMAL)
+	      {
+		sbitmap_difference (antloc[bb->index],
+				    antloc[bb->index], trapping_expr);
+		sbitmap_difference (transp[bb->index],
+				    transp[bb->index], trapping_expr);
+		break;
+	      }
 	  }
 
       sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]);
Comment 11 Paolo Bonzini 2009-02-26 09:12:21 UTC
I remember seeing this kind of insertion in very old GCCs too (inserting all sort of loads at the end of every branch of a switch statement).  I like Steven's patch, even though it's a bit brute force.
Comment 12 Jeffrey A. Law 2009-02-26 16:53:22 UTC
Subject: Re:  [4.3/4.4 Regression] GCSE-optimization
 causes enormous binary size increase (~20 times !)

steven at gcc dot gnu dot org wrote:
> ------- Comment #10 from steven at gcc dot gnu dot org  2009-02-21 16:09 -------
> OK, I checked what we're PREing here.  This is indeed partial-partial PRE.
>
> I suppose something like the following is a good idea.  I'll admit it's
> brute-force, but I'm not sure how else to stop GCSE-PRE from doing this (it's
> baked into the LCM equations).
>
> Jeff, what do you think about this PR?
>   
It's overkill, but overkill I can live with -- it'd be marginally better 
if we waited until we were sure that insertion on those edges was needed 
rather than just blindly forgetting everything we know about these 
blocks with a high in-degree.

It'd also be marginally better if we had a better metric than just 
in-degree -- specifically if we had some idea of how many redundancies 
get eliminated.   This would be a nice future enhancement.

I'll OK this after turning the in-degree check to use a PARAM.

Jeff

Comment 13 Steven Bosscher 2009-04-12 23:46:21 UTC
The real bug is that somehow MEM_ATTRS are not shared anymore.  We have lots and lots of exactly the same expression in the table, e.g.:

Index 3 (hash value 4232)
  (mem/s/f/c:SI (plus:SI (reg/f:SI 20 frame)
        (const_int -3828 [0xfffffffffffff10c])) [32 cpy.d+0 S4 A32])
Index 6 (hash value 4232)
  (mem/s/f/c:SI (plus:SI (reg/f:SI 20 frame)
        (const_int -3828 [0xfffffffffffff10c])) [32 cpy.d+0 S4 A32])
Index 10 (hash value 4232)
  (mem/s/f/c:SI (plus:SI (reg/f:SI 20 frame)
        (const_int -3828 [0xfffffffffffff10c])) [32 cpy.d+0 S4 A32])


but exp_equiv_p() thinks these are not equivalent, because the MEM_ATTRS pointers are not the same.  We should have MEM_ATTRS(x)==MEM_ATTRS(y) for two MEMs with the same memory attributes, but here the pointers are not the same.  So we're allocating MEM_ATTRS somewhere without going via the table, or we're adjusting MEM_ATTRS somewhere wvia an incorrect interface.
Comment 14 Steven Bosscher 2009-04-12 23:52:34 UTC
Ah, how subtle.

(gdb) p MEM_ATTRS(x)
$25 = (mem_attrs *) 0x7f20d1ad0440
(gdb) p MEM_ATTRS(y)
$26 = (mem_attrs *) 0x7f20d1ad71a0
(gdb) p MEM_ATTRS(*x)
$27 = (mem_attrs *) 0x7f20d1ad0440
(gdb) p MEM_ATTRS(*y)
$28 = (mem_attrs *) 0x7f20d1ad71a0
(gdb) p MEM_ATTRS(x)
$29 = (mem_attrs *) 0x7f20d1ad0440
(gdb) p MEM_ATTRS(y)
$30 = (mem_attrs *) 0x7f20d1ad71a0
(gdb) p *MEM_ATTRS(x)
$31 = {expr = 0x7f20d406cfa0, offset = 0x7f20d9e05450, size = 0x7f20d9e05490, alias = 32, 
  align = 32}
(gdb) p *MEM_ATTRS(y)
$32 = {expr = 0x7f20d4093190, offset = 0x7f20d9e05450, size = 0x7f20d9e05490, alias = 32, 
  align = 32}
(gdb) p debug_tree(MEM_ATTRS(x)->expr)
 <component_ref 0x7f20d406cfa0
    type <pointer_type 0x7f20d6bd96c0
        type <record_type 0x7f20d6bd8d80 Data type_5 BLK
            size <integer_cst 0x7f20d7fd4510 constant 160>
            unit size <integer_cst 0x7f20d7fd44b0 constant 20>
            align 32 symtab 0 alias set 87 canonical type 0x7f20d6bd8d80 fields <field_decl 0x7f20d6b58640 ref> context <record_type 0x7f20d7710840 QString>
            full-name "struct QString::Data"
            X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown
            pointer_to_this <pointer_type 0x7f20d6bd96c0> chain <type_decl 0x7f20d6bd8e40 Data>>
        sizes-gimplified unsigned SI
        size <integer_cst 0x7f20d9e00a50 constant 32>
        unit size <integer_cst 0x7f20d9e006c0 constant 4>
        align 32 symtab 0 alias set 32 canonical type 0x7f20d6bd96c0
        pointer_to_this <pointer_type 0x7f20d6a15d80> reference_to_this <reference_type 0x7f20d6be3a80>>
   
    arg 0 <var_decl 0x7f20d4073be0 cpy
        type <record_type 0x7f20d74f5b40 QString readonly sizes-gimplified addressable needs-constructing type_1 type_4 type_5 type_6 BLK size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
            align 32 symtab 0 alias set -1 canonical type 0x7f20d74f5b40
            attributes <tree_list 0x7f20d6b64b10
                purpose <identifier_node 0x7f20d7fd2f00 visibility
                bindings <(nil)>
                local bindings <(nil)>>
                value <tree_list 0x7f20d6b64ae0
                    value <string_cst 0x7f20d6b64ab0 type <array_type 0x7f20d7710b40>
                        readonly constant static "default\000">>> fields <const_decl 0x7f20d6b57280 SectionDefault>
            full-name "const class QString"
            needs-constructor needs-destructor X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown
            pointer_to_this <pointer_type 0x7f20d6b68f00> reference_to_this <reference_type 0x7f20d74f5c00>>
        addressable used tree_1 tree_2 tree_3 in_system_header decl_1 decl_5 BLK file /usr/include/QtCore/qlist.h line 420 col 21 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
        align 32 context <function_decl 0x7f20d5f90800 __comp_ctor > abstract_origin <var_decl 0x7f20d5553be0 cpy>
        (mem/s/c:BLK (plus:SI (reg/f:SI 20 frame)
        (const_int -3828 [0xfffffffffffff10c])) [31 cpy+0 S4 A32])>
    arg 1 <field_decl 0x7f20d6b58e60 d type <pointer_type 0x7f20d6bd96c0>
        used private unsigned nonlocal decl_3 SI file /usr/include/QtCore/qstring.h line 574 col 11 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
        align 32 offset_align 128
        offset <integer_cst 0x7f20d9e006f0 constant 0>
        bit offset <integer_cst 0x7f20d9e232d0 constant 0> context <record_type 0x7f20d7710840 QString>
        chain <var_decl 0x7f20d6b58f00 codecForCStrings type <pointer_type 0x7f20d6bd9900>
            used public private static unsigned external nonlocal in_system_header decl_3 decl_5 decl_6 SI file /usr/include/QtCore/qstring.h line 577 col 24 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
            align 32 context <record_type 0x7f20d7710840 QString>
            chain <type_decl 0x7f20d6b65900 QString>>>
    /usr/include/QtCore/qstring.h:670:58>
$33 = void
(gdb) p debug_tree(MEM_ATTRS(y)->expr)
 <component_ref 0x7f20d4093190
    type <pointer_type 0x7f20d6bd96c0
        type <record_type 0x7f20d6bd8d80 Data type_5 BLK
            size <integer_cst 0x7f20d7fd4510 constant 160>
            unit size <integer_cst 0x7f20d7fd44b0 constant 20>
            align 32 symtab 0 alias set 87 canonical type 0x7f20d6bd8d80 fields <field_decl 0x7f20d6b58640 ref> context <record_type 0x7f20d7710840 QString>
            full-name "struct QString::Data"
            X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown
            pointer_to_this <pointer_type 0x7f20d6bd96c0> chain <type_decl 0x7f20d6bd8e40 Data>>
        sizes-gimplified unsigned SI
        size <integer_cst 0x7f20d9e00a50 constant 32>
        unit size <integer_cst 0x7f20d9e006c0 constant 4>
        align 32 symtab 0 alias set 32 canonical type 0x7f20d6bd96c0
        pointer_to_this <pointer_type 0x7f20d6a15d80> reference_to_this <reference_type 0x7f20d6be3a80>>
   
    arg 0 <var_decl 0x7f20d408e320 cpy
        type <record_type 0x7f20d74f5b40 QString readonly sizes-gimplified addressable needs-constructing type_1 type_4 type_5 type_6 BLK size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
            align 32 symtab 0 alias set -1 canonical type 0x7f20d74f5b40
            attributes <tree_list 0x7f20d6b64b10
                purpose <identifier_node 0x7f20d7fd2f00 visibility
                bindings <(nil)>
                local bindings <(nil)>>
                value <tree_list 0x7f20d6b64ae0
                    value <string_cst 0x7f20d6b64ab0 type <array_type 0x7f20d7710b40>
                        readonly constant static "default\000">>> fields <const_decl 0x7f20d6b57280 SectionDefault>
            full-name "const class QString"
            needs-constructor needs-destructor X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown
            pointer_to_this <pointer_type 0x7f20d6b68f00> reference_to_this <reference_type 0x7f20d74f5c00>>
        addressable used tree_1 tree_2 tree_3 in_system_header decl_1 decl_5 BLK file /usr/include/QtCore/qlist.h line 420 col 21 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
        align 32 context <function_decl 0x7f20d5f90800 __comp_ctor > abstract_origin <var_decl 0x7f20d5553be0 cpy>
        (mem/s/c:BLK (plus:SI (reg/f:SI 20 frame)
        (const_int -3828 [0xfffffffffffff10c])) [31 cpy+0 S4 A32])>
    arg 1 <field_decl 0x7f20d6b58e60 d type <pointer_type 0x7f20d6bd96c0>
        used private unsigned nonlocal decl_3 SI file /usr/include/QtCore/qstring.h line 574 col 11 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
        align 32 offset_align 128
        offset <integer_cst 0x7f20d9e006f0 constant 0>
        bit offset <integer_cst 0x7f20d9e232d0 constant 0> context <record_type 0x7f20d7710840 QString>
        chain <var_decl 0x7f20d6b58f00 codecForCStrings type <pointer_type 0x7f20d6bd9900>
            used public private static unsigned external nonlocal in_system_header decl_3 decl_5 decl_6 SI file /usr/include/QtCore/qstring.h line 577 col 24 size <integer_cst 0x7f20d9e00a50 32> unit size <integer_cst 0x7f20d9e006c0 4>
            align 32 context <record_type 0x7f20d7710840 QString>
            chain <type_decl 0x7f20d6b65900 QString>>>
    /usr/include/QtCore/qstring.h:670:58>
$34 = void
(gdb) 

So the expressions look exactly the same when printed ("cpy.d") but they are actually not the same variable.  OK.  False alarm.  Pfew...
Comment 15 Richard Biener 2009-08-04 12:29:54 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 16 Steven Bosscher 2010-02-20 17:58:36 UTC
Finally (!) posted the patch...
Comment 17 comer352l 2010-02-22 15:09:18 UTC
Great, thank you Steven !
Will this patch also improve compilation time ? It currently needs about 45 seconds with 4.4.3 on a Athlon64X2-5000+ (even with -fnogcse).

In the meantime I managed to test on a windows machine with a recent version (4.4.0) and the size increase doesn't occur there (but compilation is extremely slow, too).
Comment 18 comer352l 2010-04-09 15:35:54 UTC
(In reply to comment #17)
> In the meantime I managed to test on a windows machine with a recent version
> (4.4.0) and the size increase doesn't occur there (but compilation is extremely
> slow, too).

Sorry, it seems I've made a mistake during my last test: the size increasing occurs on Windows, too.
Comment 19 Richard Biener 2010-05-22 18:13:23 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 20 Jeffrey A. Law 2011-01-13 04:12:10 UTC
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg00797.html

Apparently I didn't format my subject line correctly...
Comment 21 Jeffrey A. Law 2011-01-13 13:41:09 UTC
Author: law
Date: Thu Jan 13 13:41:03 2011
New Revision: 168747

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168747
Log:
	* PR rtl-optimization/39077
	* doc/invoke.texi (max-gcse-insertion-ratio): Document.
	* params.h (MAX_GCSE_INSERTION_RATIO): Define.
	* params.def (PARAM_MAX_GCSE_INSERTION_RATIO): Define.
	* lcm.c (pre_edge_lcm): Properly initialize output sbitmaps.
	* gcse.c (prune_insertions_deletions): New function.
	(compute_pre_data): Use it.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/gcse.c
    trunk/gcc/lcm.c
    trunk/gcc/params.def
    trunk/gcc/params.h
Comment 22 Jeffrey A. Law 2011-01-13 13:42:16 UTC
Fixed on trunk
Comment 23 Robert Hinson 2011-11-25 13:08:08 UTC
I know this is an old bug.  Probably been fixed since then.

I am using:
[oppie@fedora ~]$ g++ -v
Using built-in specs.
COLLECT_GCC=/usr/bin/g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.2/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC)

This is on a fedora core 16 box. :)

without the -fno-gcse switch I get a file with this size:
-rw-rw-r--. 1 oppie oppie 647512 Nov 25 06:49 SSMprotocol_def_en.o
With the --fno-gcse switch I get a file with this size:
-rw-rw-r--. 1 oppie oppie 649136 Nov 25 07:00 SSMprotocol_def_en.o

I also get an error:
src/SSMprotocol_def_en.cpp: In constructor ‘SSMprotocol_def_en::SSMprotocol_def_en()’:
src/SSMprotocol_def_en.cpp:23:2: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without

:)