User account creation filtered due to spam.

Bug 54896

Summary: optimization slowness on large basic blocks
Product: gcc Reporter: Tammy Hsu <tammy>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: bernds, eugene.zelenko, rguenth, steven
Priority: P3 Keywords: alias, compile-time-hog
Version: 4.7.2   
Target Milestone: ---   
Host: Target:
Build: Known to work: 4.3.2, 4.4.0, 6.0
Known to fail: 4.6.0, 4.7.2, 4.8.0, 4.9.3, 5.3.0 Last reconfirmed: 2012-10-10 00:00:00
Bug Depends on:    
Bug Blocks: 47344    
Attachments: This can be used to generate BigData.c

Description Tammy Hsu 2012-10-10 21:28:21 UTC
We have some code which consist from switch with many cases and non-random memory writes. Look like there is nothing to optimize at all, but compiling optimized version takes long time.

Attach is the code to generate source code to reproduce this problem, generate_bigdata.c.


$ gcc generate_bigdata.c  
$ ./a.out 
  (This step will generate the testcase, BigData.c.)
$ time /gcc-4.7.2/bin/gcc -m32 BigData.c -c -o BigData.o
1.504u 0.077s 0:01.61 97.5%     0+0k 320+10024io 0pf+0w
$ time /gcc-4.7.2/bin/gcc -m32 -O1 BigData.c -c -o BigData.o
50.438u 0.191s 0:50.67 99.9%    0+0k 256+10024io 0pf+0w
$ time /gcc-4.7.2/bin/gcc -m32 -O4 BigData.c -c -o BigData.o
111.710u 3.977s 1:55.73 99.9%   0+0k 1024+9384io 0pf+0w

We tried with gcc44, the optimization takes long also but not as bad as 472.

Sure, code refactoring is best solution and disabling optimization for particular file is good enough, but will be good idea to improve GCC too.
Comment 1 Steven Bosscher 2012-10-10 22:04:29 UTC
Can you please attach the test case?
Comment 2 Tammy Hsu 2012-10-10 22:09:04 UTC
Created attachment 28416 [details]
This can be used to generate BigData.c
Comment 3 Steven Bosscher 2012-10-10 22:51:12 UTC
Thanks for the test case!

Bug is confirmed with GCC 4.8 (trunk revision 192219).

Problem areas at -O1:
 alias stmt walking    :  31.68 (36%) usr
 tree DSE              :  10.83 (12%) usr
 CSE                   :   9.02 (10%) usr
 reload CSE regs       :  23.17 (26%) usr
 TOTAL                 :  87.99

Problem areas at -O2 are pretty much the same:
 alias stmt walking    :  47.29 (28%) usr
 tree DSE              :  10.84 ( 7%) usr
 CSE                   :   9.12 ( 5%) usr
 CSE 2                 :  34.66 (21%) usr
 reload CSE regs       :  45.37 (27%) usr
 TOTAL                 : 166.12


GCC 4.3.2 has the same problems at the RTL level:
 tree DSE              :  11.60 (17%) usr
 CSE                   :  19.74 (28%) usr
 CSE 2                 :  12.63 (18%) usr
 reload CSE regs       :   8.49 (12%) usr
 TOTAL                 :  69.68


tree-DSE, CSE and reload-CSE are both quadratic in the number of
instructions per basic block, and all basic blocks in this test
case contain O(10^3) instructions.

The alias stmt walking slowness is a regression, but I suspect
there a dup that Rick Biener already knows about...
Comment 4 Richard Biener 2012-10-11 14:08:16 UTC
(In reply to comment #3)
> Thanks for the test case!
> 
> Bug is confirmed with GCC 4.8 (trunk revision 192219).
> 
> Problem areas at -O1:
>  alias stmt walking    :  31.68 (36%) usr
>  tree DSE              :  10.83 (12%) usr
>  CSE                   :   9.02 (10%) usr
>  reload CSE regs       :  23.17 (26%) usr
>  TOTAL                 :  87.99
> 
> Problem areas at -O2 are pretty much the same:
>  alias stmt walking    :  47.29 (28%) usr
>  tree DSE              :  10.84 ( 7%) usr
>  CSE                   :   9.12 ( 5%) usr
>  CSE 2                 :  34.66 (21%) usr
>  reload CSE regs       :  45.37 (27%) usr
>  TOTAL                 : 166.12
> 
> 
> GCC 4.3.2 has the same problems at the RTL level:
>  tree DSE              :  11.60 (17%) usr
>  CSE                   :  19.74 (28%) usr
>  CSE 2                 :  12.63 (18%) usr
>  reload CSE regs       :   8.49 (12%) usr
>  TOTAL                 :  69.68
> 
> 
> tree-DSE, CSE and reload-CSE are both quadratic in the number of
> instructions per basic block, and all basic blocks in this test
> case contain O(10^3) instructions.

tree-DSE isn't quadratic but linear in the number of instructions
(again with a high factor, up to 256, without a --param).

> The alias stmt walking slowness is a regression, but I suspect
> there a dup that Rick Biener already knows about...

There are (for 4.8) limits in place everywhere to make alias-stmt
walking O(1) (well, with a high constant overhead ;)).

My numbers for 4.7 are, at -O2

 alias stmt walking      :  13.55 (19%) usr
 tree DSE                :   2.60 ( 4%) usr
 CSE                     :   8.79 (13%) usr
 CSE 2                   :  17.52 (25%) usr
 reload CSE regs         :  21.23 (30%) usr

with -m32 reload CSE regs is even worse.
Comment 5 Richard Biener 2012-12-03 15:50:06 UTC
Generic compile-time regression umbrella.
Comment 6 Steven Bosscher 2013-03-06 11:03:44 UTC
(In reply to comment #5)
> Generic compile-time regression umbrella.

Why? This is not an old regression.
Comment 7 rguenther@suse.de 2013-03-06 11:11:28 UTC
On Wed, 6 Mar 2013, steven at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54896
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Summary|Some optimization slowness  |optimization slowness on
>                    |with GCC 4.7.2              |large basic blocks
>       Known to fail|                            |4.6.0
> 
> --- Comment #6 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 11:03:44 UTC ---
> (In reply to comment #5)
> > Generic compile-time regression umbrella.
> 
> Why? This is not an old regression.

It's a regression on all maintained branches.
Comment 8 Steven Bosscher 2013-03-12 18:23:29 UTC
With r196576 on x86_64 (gcc17):

at -O1:
 alias stmt walking      :  30.99 (36%) usr
 reload CSE regs         :  18.94 (22%) usr
 CSE                     :  14.94 (17%) usr
 tree DSE                :   9.32 (11%) usr

at -O2:
 alias stmt walking      :  46.65 (30%) usr
 reload CSE regs         :  37.92 (25%) usr
 CSE 2                   :  29.09 (19%) usr
 CSE                     :  15.02 (10%) usr

at -O3:
 tree slp vectorization  : 129.79 (61%) usr
 alias stmt walking      :  46.52 (22%) usr
Comment 9 Richard Biener 2013-03-13 09:53:06 UTC
(In reply to comment #8)
> With r196576 on x86_64 (gcc17):
> 
> at -O1:
>  alias stmt walking      :  30.99 (36%) usr
>  reload CSE regs         :  18.94 (22%) usr
>  CSE                     :  14.94 (17%) usr
>  tree DSE                :   9.32 (11%) usr
> 
> at -O2:
>  alias stmt walking      :  46.65 (30%) usr
>  reload CSE regs         :  37.92 (25%) usr
>  CSE 2                   :  29.09 (19%) usr
>  CSE                     :  15.02 (10%) usr
> 
> at -O3:
>  tree slp vectorization  : 129.79 (61%) usr
>  alias stmt walking      :  46.52 (22%) usr

I have patches for the SLP vectorization part, queued for 4.9.
Comment 10 Bernd Schmidt 2016-01-13 17:05:37 UTC
This no longer seems reproducible with current trunk. The testcase takes about 18 seconds to compile with -O3 on my machine, with no time spent in slp-vect.

Did those patches make it into 4.9?
Comment 11 Richard Biener 2016-01-14 10:56:05 UTC
(In reply to Bernd Schmidt from comment #10)
> This no longer seems reproducible with current trunk. The testcase takes
> about 18 seconds to compile with -O3 on my machine, with no time spent in
> slp-vect.
> 
> Did those patches make it into 4.9?

No, we eliminate the write-only variable now as well.

Move 'Data' into global scope and make it non-static and it will
still reproduce with GCC 5.

I fixed it with GCC 6 it seems.

With checking enabled I see with -O3 -fno-checking (and the above static
var issue "fixed"):

 alias stmt walking      :  27.93 (34%) usr   0.12 (22%) sys  28.18 (34%) wall       0 kB ( 0%) ggc
 tree DSE                :   8.99 (11%) usr   0.00 ( 0%) sys   8.99 (11%) wall       0 kB ( 0%) ggc
 CSE                     :   7.60 ( 9%) usr   0.01 ( 2%) sys   7.62 ( 9%) wall   16382 kB ( 5%) ggc
 CSE 2                   :  12.55 (15%) usr   0.00 ( 0%) sys  12.57 (15%) wall       0 kB ( 0%) ggc
 reload CSE regs         :  17.34 (21%) usr   0.00 ( 0%) sys  17.37 (21%) wall    4687 kB ( 1%) ggc
 TOTAL                 :  81.29             0.54            81.91             315199 kB

of course there's still things to speed up here, -O0 takes 1s, -Og 26s and
-O1 50s, -O2+ run into the above ~80s.