This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 13 Oct 2007 11:26:55 -0000
- Subject: [Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
The measurements were actually done on gzip-1.2.4 sources on core2-d with:
a) gcc -mtune=generic -m32 -O2
b) gcc -mtune=generic -m32 -O3
The testfile was created as the tar archive of current SVN trunk repository,
which currently accounts for 865M uncompressed.
profile of a)
% cumulative self self total
time seconds seconds calls s/call s/call name
54.63 14.76 14.76 102254750 0.00 0.00 longest_match
18.47 19.75 4.99 1 4.99 27.02 deflate
10.25 22.52 2.77 27389 0.00 0.00 fill_window
6.81 24.36 1.84 27390 0.00 0.00 updcrc
3.15 25.21 0.85 5901 0.00 0.00 compress_block
2.85 25.98 0.77 203123663 0.00 0.00 send_bits
2.66 26.70 0.72 89123566 0.00 0.00 ct_tally
0.67 26.88 0.18 3378994 0.00 0.00 pqdownheap
0.22 26.94 0.06 17709 0.00 0.00 build_tree
0.15 26.98 0.04 11802 0.00 0.00 send_tree
0.07 27.00 0.02 1367732 0.00 0.00 bi_reverse
0.07 27.02 0.02 17710 0.00 0.00 gen_codes
0.00 27.02 0.00 27390 0.00 0.00 file_read
profile of b)
% cumulative self self total
time seconds seconds calls s/call s/call name
86.86 29.35 29.35 1 29.35 33.79 deflate
5.27 31.13 1.78 27390 0.00 0.00 updcrc
2.69 32.04 0.91 5901 0.00 0.00 compress_block
2.55 32.90 0.86 89123566 0.00 0.00 ct_tally
2.04 33.59 0.69 203123663 0.00 0.00 send_bits
0.44 33.74 0.15 17709 0.00 0.00 build_tree
0.06 33.76 0.02 1367732 0.00 0.00 bi_reverse
0.06 33.78 0.02 5903 0.00 0.00 flush_block
0.03 33.79 0.01 11802 0.00 0.00 send_tree
0.00 33.79 0.00 27390 0.00 0.00 file_read
0.00 33.79 0.00 9237 0.00 0.00 flush_outbuf
0.00 33.79 0.00 2 0.00 0.00 basename
0.00 33.79 0.00 2 0.00 0.00 copy_block
0.00 33.79 0.00 1 0.00 0.00 add_envopt
As can be seen from profiles, longest_match was inlined into deflate. Adding
__attribute__((noinline)) to longest_match prototype, we obtain:
% cumulative self self total
time seconds seconds calls s/call s/call name
55.80 13.86 13.86 102254750 0.00 0.00 longest_match
27.62 20.72 6.86 1 6.86 24.84 deflate
7.09 22.48 1.76 27390 0.00 0.00 updcrc
3.74 23.41 0.93 5901 0.00 0.00 compress_block
2.62 24.06 0.65 89123566 0.00 0.00 ct_tally
2.42 24.66 0.60 203123663 0.00 0.00 send_bits
0.56 24.80 0.14 17709 0.00 0.00 build_tree
0.08 24.82 0.02 1367732 0.00 0.00 bi_reverse
0.08 24.84 0.02 11802 0.00 0.00 send_tree
0.00 24.84 0.00 27390 0.00 0.00 file_read
0.00 24.84 0.00 9237 0.00 0.00 flush_outbuf
0.00 24.84 0.00 5903 0.00 0.00 flush_block
0.00 24.84 0.00 2 0.00 0.00 basename
0.00 24.84 0.00 2 0.00 0.00 copy_block
or ~26.5% improvement. I speculate that inlining increases register pressure on
SMALL_REGISTER_CLASS target, as this problem is not that noticeable on x86_64.
The results of 32bit run are at [1] (valid from 13. oct) and results of 64bit
run at [2].
[1]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox_32/gcc/individual-run-ratio.html
[2]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox/gcc/individual-run-ratio.html
--
Summary: non-optimal inlining heuristics pessimizes gzip SPEC
score at -O3
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ubizjak at gmail dot com
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761