I notice that PNGCrush compiled with GCC 4.4.0 (release) is about 20% slower compared to GCC 3.4.0 build. (Amiga 68060@50MHz).
CFLAGS = -I. -DNO_FSEEKO -O2 -fomit-frame-pointer -Wall -m68060 -s
PNGCrush test.png out.png
Here are the results:
GCC 3.4.0 build:
CPU time used = 267.340 seconds (decoding 16.940,
encoding 247.800, other 2.600 seconds)
GCC 4.4.0 build:
CPU time used = 328.360 seconds (decoding 16.800,
encoding 309.260, other 2.300 seconds)
Maybe someone with m68k Debian/PPC/x86 can compile PNGCrush with GCC 3.4.0 and GCC 4.4.0, so we will know if this regression happens there too?
Here is a link to the source code (I used PNGCrush 1.6.15 for test):
Here is a link to PNG image:
Anyone can try to reproduce this bug on his system?
The same problem happens with GCC 4.4.1.
This slowdown is because of libz. When I use minigzip from libz package to compress data, I get the same slowdown with GCC 4.4.1. Maybe someone will try to fix it?
The problematic source code is deflate.c from libz.
CFLAGS=-O3 -DUSE_MMAP -m68060 -fomit-frame-pointer
When I compile all source code with GCC 4.4.1, I get slow minigzip binary.
When I compile all source code with GCC 4.4.1 except deflate.c (this one I compile with GCC 3.4.0), I get minigzip binary with normal speed.
Can you check if the same preprocessed source for deflate.c (the deflate.i file obtained with --save-temps) compiles fine with both 3.4.0 and 4.4.1? If so, please attach it together with the deflate.s files produced by the two compilers (also obtained with --save-temps).
Created attachment 18377 [details]
preprocessed file from GCC 3.4.0 (compiles with GCC 4.4.1)
Created attachment 18378 [details]
preprocessed file from GCC 4.4.1 (compiles with GCC 3.4.0)
Created attachment 18379 [details]
Assembler output from GCC 3.4.0
Created attachment 18380 [details]
assembler output from GCC 4.4.1
Created attachment 18381 [details]
assembler output from GCC 3.4.0
Preprocessed files compiles with GCC 3.4.0 and GCC 4.4.1. I added them as an attachments plus asm output.
Please try again with GCC 4.4.1 -O2 vs. GCC 3.4.0 -O2 or -O3.
Here are the results from 68060@50MHz:
minigzip_340_O1 testa.tif - 34s
minigzip_340_O2 testa.tif - 31s
minigzip_340_O3 testa.tif - 31s
minigzip_441_O1 testa.tif - 40s
minigzip_441_O2 testa.tif - 38s
minigzip_441_O3 testa.tif - 42s
Can you also try with 4.5?
GCC 4.4.2 - GCC 4.4.2 (20090825).
GCC 4.5.0 - GCC 4.5.0 (20090827).
Here are the results:
cputime minigzip_340_O1 testa.tif - 33.917
cputime minigzip_340_O2 testa.tif - 30.868
cputime minigzip_340_O3 testa.tif - 31.304
cputime minigzip_442_O1 testa.tif - 39.261
cputime minigzip_442_O2 testa.tif - 37.704
cputime minigzip_442_O3 testa.tif - 41.666
cputime minigzip_450_O1 testa.tif - 41.128
cputime minigzip_450_O2 testa.tif - 37.587
cputime minigzip_450_O3 testa.tif - 37.663
cputime minigzip_412_O1 testa.tif - 34.336
cputime minigzip_412_O2 testa.tif - 34.499
cputime minigzip_412_O3 testa.tif - 34.257
cputime minigzip_425_O1 testa.tif - 36.474
cputime minigzip_425_O2 testa.tif - 35.650
cputime minigzip_425_O3 testa.tif - 35.912
cputime minigzip_432_O1 testa.tif - 39.166
cputime minigzip_432_O2 testa.tif - 35.005
cputime minigzip_432_O3 testa.tif - 38.114
At least on i686 platform the difference is marginal (GCC flags: -O2 -march=pentium-m -fomit-frame-pointer, Intel Core Gen 2 CPU):
I'm compressing a 400MB TAR archive containing miscellaneous binaries and documentation. So the issue is m68060 specific.
Can someone try to get new numbers?
4.4 branch is being closed, moving to 4.5.4 target.
GCC 4.6.4 has been released and the branch has been closed.
Without an in-depth analysis from someone with real hardware there's simply nothing we're going to be able to do here.
Sadly we don't have good tools in the m68k world like valgrind, oprofile, etc which would allow for a good in-depth analysis of what's going on. And I've found performance testing within the anarym emulator to be too variable to be of any value.
I'm closing as WONTFIX as that's the sad, unfortunate reality.