+++ This bug was initially created as a clone of Bug #36133 +++ Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible a lot was that GCC used ADDA.L instead using the shorter LEA instruction. Please see the below example for details. In line 28 and 2E you can see that two times the ADDA.L instructions was used, where instead the shorter LEA instruction could have been used. Example: C-source Code: void * copy_32x4a(void *destparam, const void *srcparam, size_t size) { int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 bras 36 14: 2290 movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0000 0010 addal #16,%a1 2e: d1fc 0000 0010 addal #16,%a0 34: 5380 subql #1,%d0 36: 4a80 tstl %d0 38: 66da bnes 14 3a: 4e75 rts For comparison here is code that you would expect: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 beq 20 14: 20d9 movel %a1@+,%a0@+ 16: 20d9 movel %a1@+,%a0@+ 18: 20d9 movel %a1@+,%a0@+ 1a: 20d9 movel %a1@+,%a0@+ 1c: 5380 subql #1,%d0 1e: 66da bnes 14 20: 4e75 rts Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- P.S. I put the noticed issues in indivitual tivkets for easier tracking. I hope that this is helpfull.
It would have been nice to check at least gcc 4.3 (or better current trunk).
(In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I've verified with latest source gcc source "version 4.4.0 20080523 (experimental) (GCC)" The most current GCC source still has the problem that ADD.L instructions are used for incrementing pointers instead using shorter LEA instruction. Code generated by GCC 4.4 for the testcase. copy_32x4: link.w %fp,#-12 movem.l #3076,(%sp) move.l 16(%fp),%d2 lsr.l #4,%d2 move.l 8(%fp),%a3 move.l 12(%fp),%a2 jra .L6 .L7: move.l (%a2),%a1 subq.l #1,%d2 move.l 4(%a2),%d0 move.l 8(%a2),%d1 move.l 12(%a2),%a0 add.l #16,%a2 move.l %a1,(%a3) move.l %d0,4(%a3) move.l %d1,8(%a3) move.l %a0,12(%a3) add.l #16,%a3 .L6: tst.l %d2 jne .L7 movem.l (%sp),#3076 unlk %fp rts Regards Gunnar
Created attachment 15699 [details] Prefer 4 byte long LEA over 6 byte long ADD.L Please include the attached patch for GCC. The added patch has changed the case statement to prefer the 4 byte long lea over the 6 byte long add.l for immediate sub/add instructions to address registers with an immediate operant size of 16bit max. LEA is optimized for pipelining (with destination forwarding) and is shorter than ADD.L Regards Gunnar von Boehn
Could you please submit your patch to gcc-patches@gcc.gnu.org, including a ChangeLog entry and stating how you tested it.
(In reply to comment #4) > Could you please submit your patch to gcc-patches@gcc.gnu.org, including a > ChangeLog entry and stating how you tested it. > As requested I did send the email last week. Do you need anything else from me to work on this?
Andreas, could you please have a look at this? Cheers Gunnar
Subject: Bug 36134 Author: ams Date: Fri Nov 14 10:49:06 2008 New Revision: 141853 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141853 Log: 2008-11-14 Maxim Kuvyrkov <maxim@codesourcery.com> Andrew Stubbs <ams@codesourcery.com> Gunnar Von Boehn <gunnar@genesi-usa.com> gcc/ PR target/36134 * config/m68k/m68k.md (addsi3_5200): Add a new alternative preferring the shorter LEA insn over ADD.L where possible. gcc/testsuite/ PR target/36134 * gcc.target/m68k/pr36134.c: New test. Added: trunk/gcc/testsuite/gcc.target/m68k/pr36134.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/m68k/m68k.md trunk/gcc/testsuite/ChangeLog
The patch posted here has been accepted and committed: http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00581.html This should resolve this issue. Andrew