Hi, I notice that PNGCrush compiled with GCC 4.4.0 (release) is about 20% slower compared to GCC 3.4.0 build. (Amiga 68060@50MHz). CFLAGS = -I. -DNO_FSEEKO -O2 -fomit-frame-pointer -Wall -m68060 -s PNGCrush test.png out.png Here are the results: GCC 3.4.0 build: CPU time used = 267.340 seconds (decoding 16.940, encoding 247.800, other 2.600 seconds) GCC 4.4.0 build: CPU time used = 328.360 seconds (decoding 16.800, encoding 309.260, other 2.300 seconds) Maybe someone with m68k Debian/PPC/x86 can compile PNGCrush with GCC 3.4.0 and GCC 4.4.0, so we will know if this regression happens there too? Here is a link to the source code (I used PNGCrush 1.6.15 for test): http://sourceforge.net/project/showfiles.php?group_id=1689&package_id=1641 Here is a link to PNG image: http://www.filejumbo.com/Download/D8F7981723E5F07C/ Regards
Anyone can try to reproduce this bug on his system?
The same problem happens with GCC 4.4.1.
This slowdown is because of libz. When I use minigzip from libz package to compress data, I get the same slowdown with GCC 4.4.1. Maybe someone will try to fix it?
The problematic source code is deflate.c from libz. CFLAGS=-O3 -DUSE_MMAP -m68060 -fomit-frame-pointer When I compile all source code with GCC 4.4.1, I get slow minigzip binary. When I compile all source code with GCC 4.4.1 except deflate.c (this one I compile with GCC 3.4.0), I get minigzip binary with normal speed.
Can you check if the same preprocessed source for deflate.c (the deflate.i file obtained with --save-temps) compiles fine with both 3.4.0 and 4.4.1? If so, please attach it together with the deflate.s files produced by the two compilers (also obtained with --save-temps). Thanks.
Created attachment 18377 [details] preprocessed file from GCC 3.4.0 (compiles with GCC 4.4.1)
Created attachment 18378 [details] preprocessed file from GCC 4.4.1 (compiles with GCC 3.4.0)
Created attachment 18379 [details] Assembler output from GCC 3.4.0
Created attachment 18380 [details] assembler output from GCC 4.4.1
Created attachment 18381 [details] assembler output from GCC 3.4.0
Preprocessed files compiles with GCC 3.4.0 and GCC 4.4.1. I added them as an attachments plus asm output.
Please try again with GCC 4.4.1 -O2 vs. GCC 3.4.0 -O2 or -O3.
Here are the results from 68060@50MHz: minigzip_340_O1 testa.tif - 34s minigzip_340_O2 testa.tif - 31s minigzip_340_O3 testa.tif - 31s minigzip_441_O1 testa.tif - 40s minigzip_441_O2 testa.tif - 38s minigzip_441_O3 testa.tif - 42s
Can you also try with 4.5?
GCC 4.4.2 - GCC 4.4.2 (20090825). GCC 4.5.0 - GCC 4.5.0 (20090827). Here are the results: cputime minigzip_340_O1 testa.tif - 33.917 cputime minigzip_340_O2 testa.tif - 30.868 cputime minigzip_340_O3 testa.tif - 31.304 cputime minigzip_442_O1 testa.tif - 39.261 cputime minigzip_442_O2 testa.tif - 37.704 cputime minigzip_442_O3 testa.tif - 41.666 cputime minigzip_450_O1 testa.tif - 41.128 cputime minigzip_450_O2 testa.tif - 37.587 cputime minigzip_450_O3 testa.tif - 37.663
cputime minigzip_412_O1 testa.tif - 34.336 cputime minigzip_412_O2 testa.tif - 34.499 cputime minigzip_412_O3 testa.tif - 34.257 cputime minigzip_425_O1 testa.tif - 36.474 cputime minigzip_425_O2 testa.tif - 35.650 cputime minigzip_425_O3 testa.tif - 35.912 cputime minigzip_432_O1 testa.tif - 39.166 cputime minigzip_432_O2 testa.tif - 35.005 cputime minigzip_432_O3 testa.tif - 38.114
At least on i686 platform the difference is marginal (GCC flags: -O2 -march=pentium-m -fomit-frame-pointer, Intel Core Gen 2 CPU): GCC 3.4.6 real 0m15.859s user 0m15.662s sys 0m0.177s GCC 4.5.2 real 0m16.147s user 0m15.939s sys 0m0.187s I'm compressing a 400MB TAR archive containing miscellaneous binaries and documentation. So the issue is m68060 specific.
Can someone try to get new numbers?
4.4 branch is being closed, moving to 4.5.4 target.
GCC 4.6.4 has been released and the branch has been closed.
Without an in-depth analysis from someone with real hardware there's simply nothing we're going to be able to do here. Sadly we don't have good tools in the m68k world like valgrind, oprofile, etc which would allow for a good in-depth analysis of what's going on. And I've found performance testing within the anarym emulator to be too variable to be of any value. I'm closing as WONTFIX as that's the sad, unfortunate reality.