Bug 40454 - [4.7/4.8/4.9 regression] zlib is about 20% slower when compiled with GCC 4.4.1
Summary: [4.7/4.8/4.9 regression] zlib is about 20% slower when compiled with GCC 4.4.1
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.1
: P4 normal
Target Milestone: 4.7.4
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-06-15 23:36 UTC by ami_stuff
Modified: 2014-02-11 06:08 UTC (History)
4 users (show)

See Also:
Host: i686-cygwin
Target: m68k-amigaos
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-09-03 07:27:02


Attachments
preprocessed file from GCC 3.4.0 (compiles with GCC 4.4.1) (11.46 KB, text/plain)
2009-08-17 09:21 UTC, ami_stuff
Details
preprocessed file from GCC 4.4.1 (compiles with GCC 3.4.0) (11.42 KB, text/plain)
2009-08-17 09:21 UTC, ami_stuff
Details
Assembler output from GCC 3.4.0 (9.45 KB, text/plain)
2009-08-17 09:22 UTC, ami_stuff
Details
assembler output from GCC 4.4.1 (10.13 KB, text/plain)
2009-08-17 09:23 UTC, ami_stuff
Details
assembler output from GCC 3.4.0 (9.45 KB, text/plain)
2009-08-17 09:23 UTC, ami_stuff
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ami_stuff 2009-06-15 23:36:29 UTC
Hi,

I notice that PNGCrush compiled with GCC 4.4.0 (release) is about 20% slower compared to GCC 3.4.0 build. (Amiga 68060@50MHz). 

CFLAGS = -I. -DNO_FSEEKO -O2 -fomit-frame-pointer -Wall -m68060 -s


PNGCrush test.png out.png


Here are the results:

GCC 3.4.0 build:

CPU time used = 267.340 seconds (decoding 16.940,
encoding 247.800, other 2.600 seconds)

GCC 4.4.0 build:

CPU time used = 328.360 seconds (decoding 16.800,
encoding 309.260, other 2.300 seconds) 


Maybe someone with m68k Debian/PPC/x86 can compile PNGCrush with GCC 3.4.0 and GCC 4.4.0, so we will know if this regression happens there too?


Here is a link to the source code (I used PNGCrush 1.6.15 for test):

http://sourceforge.net/project/showfiles.php?group_id=1689&package_id=1641


Here is a link to PNG image:

http://www.filejumbo.com/Download/D8F7981723E5F07C/


Regards
Comment 1 ami_stuff 2009-06-18 11:27:52 UTC
Anyone can try to reproduce this bug on his system?
Comment 2 ami_stuff 2009-08-12 12:09:55 UTC
The same problem happens with GCC 4.4.1.
Comment 3 ami_stuff 2009-08-16 01:28:45 UTC
This slowdown is because of libz. When I use minigzip from libz package to compress data, I get the same slowdown with GCC 4.4.1. Maybe someone will try to fix it?
Comment 4 ami_stuff 2009-08-16 14:02:53 UTC
The problematic source code is deflate.c from libz.

CFLAGS=-O3 -DUSE_MMAP -m68060 -fomit-frame-pointer

When I compile all source code with GCC 4.4.1, I get slow minigzip binary.

When I compile all source code with GCC 4.4.1 except deflate.c (this one I compile with GCC 3.4.0), I get minigzip binary with normal speed.
Comment 5 Paolo Bonzini 2009-08-17 08:42:42 UTC
Can you check if the same preprocessed source for deflate.c (the deflate.i file obtained with --save-temps) compiles fine with both 3.4.0 and 4.4.1?  If so, please attach it together with the deflate.s files produced by the two compilers (also obtained with --save-temps).

Thanks.
Comment 6 ami_stuff 2009-08-17 09:21:07 UTC
Created attachment 18377 [details]
preprocessed file from GCC 3.4.0 (compiles with GCC 4.4.1)
Comment 7 ami_stuff 2009-08-17 09:21:44 UTC
Created attachment 18378 [details]
preprocessed file from GCC 4.4.1 (compiles with GCC 3.4.0)
Comment 8 ami_stuff 2009-08-17 09:22:38 UTC
Created attachment 18379 [details]
Assembler output from GCC 3.4.0
Comment 9 ami_stuff 2009-08-17 09:23:25 UTC
Created attachment 18380 [details]
assembler output from GCC 4.4.1
Comment 10 ami_stuff 2009-08-17 09:23:48 UTC
Created attachment 18381 [details]
assembler output from GCC 3.4.0
Comment 11 ami_stuff 2009-08-17 09:26:45 UTC
Preprocessed files compiles with GCC 3.4.0 and GCC 4.4.1. I added them as an attachments plus asm output.
Comment 12 Paolo Bonzini 2009-08-17 13:30:28 UTC
Please try again with GCC 4.4.1 -O2 vs. GCC 3.4.0 -O2 or -O3.
Comment 13 ami_stuff 2009-08-17 15:17:20 UTC
Here are the results from 68060@50MHz:

minigzip_340_O1 testa.tif - 34s
minigzip_340_O2 testa.tif - 31s
minigzip_340_O3 testa.tif - 31s

minigzip_441_O1 testa.tif - 40s
minigzip_441_O2 testa.tif - 38s
minigzip_441_O3 testa.tif - 42s
Comment 14 Paolo Bonzini 2009-09-03 07:27:02 UTC
Can you also try with 4.5?
Comment 15 ami_stuff 2009-09-07 19:22:15 UTC
GCC 4.4.2 - GCC 4.4.2 (20090825).
GCC 4.5.0 - GCC 4.5.0 (20090827).

Here are the results:

cputime minigzip_340_O1 testa.tif - 33.917
cputime minigzip_340_O2 testa.tif - 30.868
cputime minigzip_340_O3 testa.tif - 31.304

cputime minigzip_442_O1 testa.tif - 39.261
cputime minigzip_442_O2 testa.tif - 37.704
cputime minigzip_442_O3 testa.tif - 41.666

cputime minigzip_450_O1 testa.tif - 41.128
cputime minigzip_450_O2 testa.tif - 37.587
cputime minigzip_450_O3 testa.tif - 37.663
Comment 16 ami_stuff 2009-09-10 12:58:03 UTC
cputime minigzip_412_O1 testa.tif - 34.336
cputime minigzip_412_O2 testa.tif - 34.499
cputime minigzip_412_O3 testa.tif - 34.257

cputime minigzip_425_O1 testa.tif - 36.474
cputime minigzip_425_O2 testa.tif - 35.650
cputime minigzip_425_O3 testa.tif - 35.912

cputime minigzip_432_O1 testa.tif - 39.166
cputime minigzip_432_O2 testa.tif - 35.005
cputime minigzip_432_O3 testa.tif - 38.114
Comment 17 Artem S. Tashkinov 2011-01-31 21:46:29 UTC
At least on i686 platform the difference is marginal (GCC flags: -O2 -march=pentium-m -fomit-frame-pointer, Intel Core Gen 2 CPU):

GCC 3.4.6
real    0m15.859s
user    0m15.662s
sys     0m0.177s

GCC 4.5.2
real    0m16.147s
user    0m15.939s
sys     0m0.187s

I'm compressing a 400MB TAR archive containing miscellaneous binaries and documentation. So the issue is m68060 specific.
Comment 18 Andrew Pinski 2012-01-21 20:32:14 UTC
Can someone try to get new numbers?
Comment 19 Jakub Jelinek 2012-03-13 12:45:41 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 20 Jakub Jelinek 2013-04-12 15:16:06 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 21 Jeffrey A. Law 2014-02-11 06:08:34 UTC
Without an in-depth analysis from someone with real hardware there's simply nothing we're going to be able to do here. 

Sadly we don't have good tools in the m68k world like valgrind, oprofile, etc which would allow for a good in-depth analysis of what's going on.  And I've found performance testing within the anarym emulator to be too variable to be of any value.

I'm closing as WONTFIX as that's the sad, unfortunate reality.