Bug 15678

Summary: [4.0/4.1/4.2 Regression] CSiBE i686 compilation time increased by 8% at -O2
Product: gcc Reporter: Tamas Gergely <gertom>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: gcc-bugs, gertom, giovannibajo, pinskia
Priority: P4 Keywords: compile-time-hog, memory-hog
Version: 4.0.0   
Target Milestone: 4.0.4   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2004-12-17 17:57:14
Bug Depends on: 17707    
Bug Blocks: 18687    
Attachments: Some test cases for arm-elf target.
Test cases for i686-linux target.
small testcase (462 bytes)

Description Tamas Gergely 2004-05-27 10:05:07 UTC
After tree-ssa was merged into mainline on 2004-05-12/13, compilation time for
many targets increased by 15-20% using -O2 and 13-18% using -Os. Targets include
arm-elf, i386-elf, i686-linux, mips-elf, ppc-elf. This time increment was
measured with CSiBE (http://www.inf.u-szeged.hu/CSiBE).
Comment 1 Tamas Gergely 2004-05-27 10:19:31 UTC
Created attachment 6397 [details]
Some test cases for arm-elf target.

Here are some test cases whose compilation time increased more than 25% for all
the targets we examined. (I preprocessed the sources for arm-elf target.)
Comment 2 Giovanni Bajo 2004-05-27 10:51:30 UTC
Thank you for this report. As a side note, I suggest you to use a different 
target, as arm-elf is not a very common development environment. x86, x86-64, 
ia64, ppc, alpha are probably the best, in (approximately) this order. 
Hopefully the testcases are portable, though.
Comment 3 Tamas Gergely 2004-05-27 11:47:24 UTC
Created attachment 6398 [details]
Test cases for i686-linux target.

Here are the same files now preprocessed for i686-linux.
Comment 4 Andrew Pinski 2004-05-27 12:09:35 UTC
lpgparse.i increased from 2.28 (in 3.4.0) to 5.53 on the mainline (but note that is my comparision is 
unfare in that the mainline compiler is built with checking).
Comment 5 Serge Belyshev 2004-05-27 13:23:12 UTC
Created attachment 6400 [details]
small testcase (462 bytes)

This one demonstrates 50..70% degradation in compile time and 250..350% in
memory usage (depends on -O option).
Comment 6 Serge Belyshev 2004-05-27 18:18:52 UTC
benchmark:

compiler        time (s)        mem (MB)

gcc-2.7          3.7             63
gcc-2.7 -O2      5.1             63

gcc-2.95         5.4             80
gcc-2.95 -O2    15.6             81

gcc-3.3.4       14.8             59
gcc-3.3.4 -O2   37.3             60

gcc-3.4.1       16.1             56
gcc-3.4.1 -O2   31.8             76

gcc-3.5.0       19.9            169
gcc-3.5.0 -O2   53.3            200

icc-8 -O0       28.2             41
icc-8 -O2       63.6             41

notes:
 * 'time' is 'user' compilation time (not sys or real), avg from 3 runs
 * 'mem' is maximum data size
 * all compilers invoked with 'f.c -fomit-frame-pointer -S -o /dev/null'
 * my machine is x86 1107 MHz CPU with 64/64 KB L1 64 KB L2 and 256 MB of 143
MHz SDRAM
 * gcc-2.7 is 2.7.2.3 from debian
 * gcc-2.95 is 2.95.4 20030502 (prerelease)
 * gcc-3.3.4 is 3.3.4 20040328 (prerelease)
 * gcc-3.4.1 is 3.4.1 20040516 (prerelease)
 * gcc-3.5.0 is 3.5.0 20040526 (experimental)
 * icc-8 is 8.0.055
Comment 7 Andrew Pinski 2004-05-27 18:23:13 UTC
Confirmed, I will also note that (with checking never the less) -O0 -funit-at-a-time is faster than -O0 
alone on the simple testcase which Serge Belyshev produced, maybe this means we should be doing 
-funit-at-a-time always now.
Comment 8 Serge Belyshev 2004-05-28 06:16:33 UTC
gcc-2.6          2.3             61
gcc-2.6 -O2      3.2             61

gcc-2.3          1.8             14
gcc-2.3 -O2      2.4             14

$ ./xgcc -v
gcc version 2.3.3
Comment 9 Serge Belyshev 2004-05-28 11:57:56 UTC
gcc-2.0          1.3             11
gcc-2.0 -O2      1.8             11

gcc-1.39         1.14            16
gcc-1.39 -O      1.24            17

gcc-1.27         0.92             8
gcc-1.27 -O      0.77             8

note: gcc-1.27 does not respect -fomit-frame-pointer withot -O, this is why it
is slower without it.
Comment 10 Andrew Pinski 2004-06-28 07:50:28 UTC
Interesting data point for the testcase with many empty functions -funit-at-a-time is faster for -O0 for 
3.4.0 and 3.5.0 (note the timings for 3.5.0  in this case are not very good as this is with checking 
enabled):
tin:~/src/gnu/gcctest>time ~/ia32_linux_gcc3_4/bin/gcc -S pr15678.c
6.870u 1.470s 0:08.86 94.1%     0+0k 0+0io 626pf+0w
tin:~/src/gnu/gcctest>time ~/ia32_linux_gcc3_4/bin/gcc -S pr15678.c -funit-at-a-time
4.950u 0.940s 0:06.42 91.7%     0+0k 0+0io 624pf+0w

tin:~/src/gnu/gcctest>time gcc -S pr15678.c
20.120u 0.770s 0:21.28 98.1%    0+0k 0+0io 741pf+0w
tin:~/src/gnu/gcctest>time gcc -S pr15678.c -funit-at-a-time
16.520u 0.650s 0:17.68 97.1%    0+0k 0+0io 742pf+0w
Comment 11 Andrew Pinski 2004-06-28 07:56:04 UTC
Just to get a feel for what we have done in the recent years:
tin:~/src/gnu/gcctest>time ~/ia32_linux_gcc3_0/bin/gcc -S pr15678.c
8.570u 0.300s 0:09.59 92.4%     0+0k 0+0io 543pf+0w
tin:~/src/gnu/gcctest>time ~/ia32_linux_gcc3_2/bin/gcc -S pr15678.c
9.040u 0.260s 0:09.93 93.6%     0+0k 0+0io 575pf+0w
tin:~/src/gnu/gcctest>time ~/ia32_linux_gcc3_3/bin/gcc -S pr15678.c
6.390u 0.800s 0:08.29 86.7%     0+0k 0+0io 595pf+0w

From 3.0.4 to 3.2.3 we regressioned and then sped up again.
Comment 12 Andrew Pinski 2004-11-13 07:45:43 UTC
I think we are not as bad as before now but I could be wrong.
Comment 13 Giovanni Bajo 2004-11-13 14:36:56 UTC
It would be good to get updated timings for this.
Comment 14 Serge Belyshev 2004-11-16 21:59:15 UTC
-O0             3.4.4		4.0.0		diff
--------------------------------------------------------
cc1             18.4		23.8		+29%
cc1plus         24.3    	19.5    	-20%


-O2             3.4.4		4.0.0		diff
--------------------------------------------------------
cc1             32.8    	58.0    	+77%
cc1plus         37.1    	56.8    	+53%


-fomit-frame-pointer is implied.
Comment 15 Andrew Pinski 2004-11-17 00:09:27 UTC
At -O3 for the f.c in here, 40% of the time (at least on ppc-darwin) is spent allocating memory or 
freeing it so reducing the amount of memory used over all will help.
Comment 16 Steven Bosscher 2004-12-17 08:44:16 UTC
This would be a good time for new timings. 
Comment 17 Serge Belyshev 2004-12-17 17:57:14 UTC
                3.4.4           4.0.0 (*A)      4.0.0 (*B)      deltaA  deltaB

time to compile one empty function, ms:

cc1-O0          0.4334          0.5908          0.5836           36%     35%
cc1plus-O0      0.6155          0.4700          0.4613          -24%    -25%

cc1-O2          0.8090          1.3886          1.3090           71%     62%
cc1plus-O2      0.9213          1.4436          1.3767           57%     49%


startup time, i.e. time to "compile" empty file, ms:

cc1-O0          18.3            18.1            17.4             -1%     -5%
cc1plus-O0      22.2            21.0            20.4             -5%     -8%

cc1-O2          20.4            19.3            19.1             -5%     -6%
cc1plus-O2      23.7            22.3            21.7             -6%     -8%


*A -- gcc 4.0.0 20041217 compiled by gcc 3.4.4 20041217
*B -- gcc 4.0.0 20041217 compiled by itself.

All errors are within 0.05 .. 0.5 %
Comment 18 Steven Bosscher 2005-02-02 08:28:06 UTC
http://www.inf.u-szeged.hu/csibe/s-i686-linux.php shows compile time 
regressions: 
 
target      -O2   -Os 
i686-linux  10%    5% 
i386-elf     8%   -1% 
mips-elf    10%    2% 
ppc-elf      8%   -3% 
sh-elf       2%   -7% 
 
 
Comment 19 Steven Bosscher 2005-02-14 10:39:00 UTC
Numbers today (2005-02-13): 
target      -O2   -Os  
i686-linux   8%    3%  
i386-elf     5%   -2%  
mips-elf     6%    0%  
ppc-elf      3%   -3%  
sh-elf      -1%   -8%  
 
(note: numbers here and in comment #18 are wrt. 2004-01-11, ie. pre tree-ssa) 
 
Code size is also down at -Os by ~4% on average. 
 
 
 
Comment 20 Andrew Pinski 2005-07-25 03:09:40 UTC
Even at -O0, the small testcase hs increased in compile time.
Comment 21 Andrew Pinski 2005-07-25 03:18:41 UTC
Huh, at -O0:
tree-ssanames.c:82 (init_ssanames)                        0: 0.0%    9961472:46.6%          0: 0.0%    1572864: 
4.5%      32768

This seems high:
function.c:3782 (allocate_struct_function)         23594400: 8.2%          0: 0.0%          0: 0.0%    6816160:
19.5%      32770

too
Comment 22 Mark Mitchell 2005-10-30 22:31:42 UTC
According to the data in Comment #19, we're now better on some cases, and worse on others.  I'd suggest we just close this PR.

However, in the meanwhile, I've downgraded this to P4.  A small compile-time increase isn't going to block the upcoming releases.
Comment 23 Andrew Pinski 2006-08-15 18:39:51 UTC
Can someone do timings on this one for the mainline, I think the mainline is beating 4.0.x now.
Comment 24 Steven Bosscher 2006-09-09 10:54:21 UTC
Closing because mainline is faster than 4.0
Comment 25 Steven Bosscher 2006-09-09 10:54:43 UTC
.