+++ This bug was initially created as a clone of Bug #36133 +++ Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible a lot was that GCC uses suboptimal addressing modes. Please see the below example for details. In line 14 to line 2E this code was created: 14: 2290 movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0000 0010 addal #16,%a1 2e: d1fc 0000 0010 addal #16,%a0 Much shorter and more efficient would have been this: 14: 20d9 movel %a1@+,%a0@+ 16: 20d9 movel %a1@+,%a0@+ 18: 20d9 movel %a1@+,%a0@+ 1a: 20d9 movel %a1@+,%a0@+ Example: C-source Code: void * copy_32x4a(void *destparam, const void *srcparam, size_t size) { int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 bras 36 14: 2290 movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0000 0010 addal #16,%a1 2e: d1fc 0000 0010 addal #16,%a0 34: 5380 subql #1,%d0 36: 4a80 tstl %d0 38: 66da bnes 14 3a: 4e75 rts For comparison here is code that you would expect: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 beq 20 14: 20d9 movel %a1@+,%a0@+ 16: 20d9 movel %a1@+,%a0@+ 18: 20d9 movel %a1@+,%a0@+ 1a: 20d9 movel %a1@+,%a0@+ 1c: 5380 subql #1,%d0 1e: 66da bnes 14 20: 4e75 rts Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- P.S. I put the noticed issues in individual tickets for easier tracking. I hope that this is helpful to you.
It would have been nice to check at least gcc 4.3 (or better current trunk).
(In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I have verified this for you with the most current GCC source. Verified with gcc version 4.4.0 20080523 (experimental) (GCC) The problem that GCC uses bad addressing modes is still persistent. Code generated by GCC 4.4 copy_32x4: link.w %fp,#-12 movem.l #3076,(%sp) move.l 16(%fp),%d2 lsr.l #4,%d2 move.l 8(%fp),%a3 move.l 12(%fp),%a2 jra .L6 .L7: move.l (%a2),%a1 subq.l #1,%d2 move.l 4(%a2),%d0 move.l 8(%a2),%d1 move.l 12(%a2),%a0 add.l #16,%a2 move.l %a1,(%a3) move.l %d0,4(%a3) move.l %d1,8(%a3) move.l %a0,12(%a3) add.l #16,%a3 .L6: tst.l %d2 jne .L7 movem.l (%sp),#3076 unlk %fp rts
Andreas, What is your opinion to this? GCC 2.9 used to combine the move with increment in the combine step to something like this: *** (insn 32 30 33 (set (reg/v:SI 32) (mem:SI (post_inc:SI (reg/v:SI 34)) 0)) 42 {movsi+1} (nil) (expr_list:REG_INC (reg/v:SI 34) (nil))) *** So problem is that now GCC seems not to be able to do this anymore by itself With GCC 4.4 the output is: ** (insn 34 33 35 4 example2.c:11 (set (reg/v:SI 54 [ value ]) (mem:SI (reg/v/f:SI 52 [ src ]) [2 S4 A16])) 37 {*movsi_cf} (nil)) (insn 35 34 36 4 example2.c:12 (set (reg/v:SI 53 [ value2 ]) (mem:SI (plus:SI (reg/v/f:SI 52 [ src ]) (const_int 4 [0x4])) [2 S4 A16])) 37 {*movsi_cf} (nil)) (insn 36 35 38 4 example2.c:5 (set (reg/v/f:SI 52 [ src ]) (plus:SI (reg/v/f:SI 52 [ src ]) (const_int 8 [0x8]))) 133 {*addsi3_5200} (nil)) (insn 38 36 40 4 example2.c:10 (set (reg/v:SI 50 [ size.21 ]) (plus:SI (reg/v:SI 50 [ size.21 ]) (const_int -1 [0xffffffff]))) 133 {*addsi3_5200} (nil)) *** Any ideas about this? Kind regards Gunnar von Boehn
This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. There is another bug about this somewhere I think for arm. PowerPC has the same issue too ...
(In reply to comment #4) > This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. > There is another bug about this somewhere I think for arm. PowerPC has the > same issue too ... > If this effects so many platforms this sounds like an important issue to me. Maybe someone should increase the priority and severity of the issue in this case? Andrew, do you plan to fix this issue? Cheers Gunnar
(In reply to comment #4) > This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. > There is another bug about this somewhere I think for arm. PowerPC has the > same issue too ... > Hi Andrew, I want to make clear that the 68K backend used to be able to do this optimization in the GCC 2.9 times. Later with 3.4 or 4.x this optmization did not work anymore and the code became worth. Does this make sense in your opinion? Cheers
>Andrew, do you plan to fix this issue? Personally no. Mostly because IV-opts is hard to understand. Also it is not the m68k back-end doing the optimization rather loop.c did it. See PR 31849. *** This bug has been marked as a duplicate of 31849 ***