This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Some perfectible results from the new bottom-up inliner


Hi,

some days ago I started comparing the new inliner developed by Nathan
Sidwell to the old-one on my system - PII-400, 256 MBytes, Linux2.4.6,
glibc2.2.3 - using the recent snapshot gcc3.1 20010706.

Indeed, most of the small benchmarks I have available (Haney Speed,
OOpack and some others) evidenced greatly reduced compilation time at
-O3 and much smaller executables. Usually the run time did not change
appreciably, however, with the pleasing exception of the Stepanov test.

On the other hand, sometimes I found a sensible speed loss at run time
and therefore I decided to submit to your attention a few examples in
the hope that Nathan and the other developers could devise an even
better solution.

My first examples are two small programs from Bentley's book
"Programming Pearls, 2nd Ed.", which recently I have been running many
times in order to verify the speed improvements possible today with
std::ios::sync_with_stdio(false); thanks to a recent patch by Loren
James Rittle. You will find attached both of them, that is wordlist.cpp
and wordfreq.cpp.
Running them with the widespread "web2" file as input
(http://www.FreeBSD.org/cgi/cvsweb.cgi/src/share/dict/web2) and
/dev/null as output I got the following sizes and run times:

------------
------------
wordlist.cpp
------------
------------

3.1 -O2
-------
34671 -> strip -> 20412

4.400u 0.160s 0:04.56 100.0%    0+0k 0+0io 238pf+0w

3.1 -O3
-------
52599 -> strip -> 36508

4.680u 0.060s 0:04.74 100.0%    0+0k 0+0io 243pf+0w


3.1+optimize4.patch -O2
-----------------------
36533 -> strip -> 21172

5.270u 0.110s 0:05.37 100.1%    0+0k 0+0io 237pf+0w


3.1+optimize4.patch -O3
-----------------------
36675 -> strip -> 21396

5.150u 0.060s 0:05.21 100.0%    0+0k 0+0io 237pf+0w

-----------
------------
wordfreq.cpp
------------
------------

3.1 -O2
-------
4.830u 0.020s 0:04.84 100.2%    0+0k 0+0io 241pf+0w

3.1 -O3
-------
5.040u 0.070s 0:05.11 100.0%    0+0k 0+0io 250pf+0w


3.1+optimize4.patch -O2
-----------------------
5.770u 0.050s 0:05.81 100.1%    0+0k 0+0io 217pf+0w

3.1+optimize4.patch -O3
-----------------------
5.540u 0.070s 0:05.60 100.1%    0+0k 0+0io 217pf+0w


The next example is a small example exercizing some nice classes
developed by Bavestrelli and published in C/C++ Users Journal
(Test_Bavestrelli.cpp and array.h). In this case the difference is
especially noticeable and very well reproducable for the "Index my array
normally" test, the most important one indeed, involving the recursive
template techniques proposed by the author, both at -O2 and at -O3:

-----------
-----------
Bavestrelli
-----------
-----------

3.1 -O2
-------
32843 -> strip -> 20260

Test Allocation and Initialization

Allocated C Array      : Time=1220000
Allocated std::vector  : Time=2430000
Allocated my Array     : Time=1480000

Test Indexing

Indexing normal C array: K=50000000 Time=1470000
Indexing an std::vector: K=50000000 Time=1520000
Index my array normally: K=50000000 Time=2220000 <---------------
Index using iterator   : K=50000000 Time=1090000
Index with SubArray ref: K=50000000 Time=1190000

Completed Job

3.1 -O3
-------
34503 -> strip -> 21900

Test Allocation and Initialization

Allocated C Array      : Time=1230000
Allocated std::vector  : Time=2400000
Allocated my Array     : Time=1480000

Test Indexing

Indexing normal C array: K=50000000 Time=1460000
Indexing an std::vector: K=50000000 Time=1520000
Index my array normally: K=50000000 Time=2210000 <--------------
Index using iterator   : K=50000000 Time=1090000
Index with SubArray ref: K=50000000 Time=1130000

Completed Job

3.1+optimize4.patch -O2
-----------------------
29557 -> strip -> 15988

Test Allocation and Initialization

Allocated C Array      : Time=1260000
Allocated std::vector  : Time=2500000
Allocated my Array     : Time=1510000

Test Indexing

Indexing normal C array: K=50000000 Time=1180000
Indexing an std::vector: K=50000000 Time=1200000
Index my array normally: K=50000000 Time=2310000 <--------------
Index using iterator   : K=50000000 Time=1180000
Index with SubArray ref: K=50000000 Time=1170000

Completed Job

3.1+optimize4.patch -O3
-----------------------
29340 -> strip -> 15856

Test Allocation and Initialization

Allocated C Array      : Time=1250000
Allocated std::vector  : Time=2530000
Allocated my Array     : Time=1510000

Test Indexing

Indexing normal C array: K=50000000 Time=1180000
Indexing an std::vector: K=50000000 Time=1210000
Index my array normally: K=50000000 Time=2310000 <---------------
Index using iterator   : K=50000000 Time=1170000
Index with SubArray ref: K=50000000 Time=1190000

Completed Job


The last example is a little program discussed by Stanley Lippman some
time ago in a nice paper comparing the optimizations produced by
inlining and by sticking with the default constructors and destructors
when nothing more sophisticated is really needed. In this case I ran
500.000 and 1.000.000 iterations (see line #107 in the source), and in
the second case, which barely fitted in my RAM, I ran it many times
until convergence of time's output. As could be anticipated from the
source code, in this case differences show up only at -O3:

-----------
-----------
lippman3.cc
-----------
-----------

3.1 -O2
-------
19881 -> strip -> 8532

500.000
1.640u 1.010s 0:02.65 100.0%    0+0k 0+0io 190pf+0w

1.000.000
3.350u 1.920s 0:05.27 100.0%    0+0k 0+0io 191pf+0w


3.1 -O3
-------
size 29728 -> strip -> 18076

500.000
1.460u 0.930s 0:02.39 100.0%    0+0k 0+0io 193pf+0w

1.000.000
3.010u 1.740s 0:04.75 100.0%    0+0k 0+0io 194pf+0w


3.1+optimize4.patch -O2
-----------------------
20665 -> strip -> 8844

500.000
1.730u 0.930s 0:02.66 100.0%    0+0k 0+0io 190pf+0w

1.000.000
3.410u 1.890s 0:05.30 100.0%    0+0k 0+0io 191pf+0w


3.1+optimize4.patch -O3
-----------------------
size 20729 -> strip -> 8908

500.000
1.590u 1.050s 0:02.64 100.0%    0+0k 0+0io 190pf+0w

1.000.000
3.350u 1.910s 0:05.25 100.1%    0+0k 0+0io 191pf+0w



Thanks for your attention,
Paolo Carlini.

P.s. Apart from the lippman3.cc case, no doubles are involved in the
tests, so the obnoxious stack alignment issues should not be very
relevant here. For lippman3 I checked that -mpreferred-stack-boundary=2
slightly improved roughly of the same amount all the tests without
appreciably affecting the relative differences.

test_codes.tar.gz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]