Bug 36136 - GCC creates suboptimal ASM : constant work registers are set up inside work loops and not outside of the loop
Summary: GCC creates suboptimal ASM : constant work registers are set up inside work l...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.2.1
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-05-05 14:03 UTC by Gunnar von Boehn
Modified: 2008-05-28 16:28 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: m68k-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gunnar von Boehn 2008-05-05 14:03:23 UTC
+++ This bug was initially created as a clone of Bug #36133 +++

Hello,

The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2,
the generated code got partially much worse with GCC 4.


One problem that was visible a lot was that GCC set ups constant work registers inside of working loops and not outside of them.
At address (1c): the instruction moveq #1,%d1 to set up the work register is inside the working loop and will be unneeded executed with very iteration.
 
Second problem:
At address (16) the instruction movel #1,%a0@ uses the literal value #1 and not the work register that has the same value. The literal move.l #1 has a length of 6 bytes while using the work register would have 2 bytes only.


Example: C-source
Code:
void * write_32x4(void *destparam, const void *srcparam, size_t size)
{
        int  value=1;
        int *dst = destparam;
        size = size / 16;
        for (; size; size--) {
             *dst++=value;
             *dst++=value;
             *dst++=value;
             *dst++=value;
        }
} 
Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c

Code generated by GCC 4.2:
<write_32x4>:
0a:       202f 000c       movel %sp@(12),%d0
0e:       206f 0004       moveal %sp@(4),%a0
12:       e888            lsrl #4,%d0
14:       601c            bras 32
16:       20bc 0000 0001  movel #1,%a0@
1c:       7201            moveq #1,%d1
1e:       2141 0004       movel %d1,%a0@(4)
22:       2141 0008       movel %d1,%a0@(8)
26:       2141 000c       movel %d1,%a0@(12)
2a:       d1fc 0000 0010  addal #16,%a0
30:       5380            subql #1,%d0
32:       4a80            tstl %d0
34:       66e0            bnes 16
36:       4e75            rts 
Generated code length = 46 Byte
Length of Workloop: 9 instructions, 32 byte 


For comparison here is code that you would expect:
0a:       202f 000c       movel %sp@(12),%d0
0e:       206f 0004       moveal %sp@(4),%a0
12:       7201            moveq #1,%d1
14:       e888            lsrl #4,%d0
16:       601c            beqs 24
18:       21c0            movel %d1,%a1@+
1a:       21c0            movel %d1,%a1@+
1c:       21c0            movel %d1,%a1@+
1e:       21c0            movel %d1,%a1@+
20:       5380            subql #1,%d0
22:       66e0            bnes 18
24:       4e75            rts 
Expected code length = 28 Byte
Length of Workloop: 6 instructions, 12 byte 


Compiler used:
m68k-linux-gnu-gcc -v
Using built-in specs.
Target: m68k-linux-gnu
Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin
Thread model: posix
gcc version 4.2.1 (Sourcery G++ Lite 4.2-47)


I hope that this report help you to improve the quality of GCC.

Kind regards

Gunnar von Boehn
--
P.S. I put the noticed issues in individual tickets for easier tracking. I hope that this is helpful to you.
Comment 1 Richard Biener 2008-05-07 19:32:37 UTC
It would have been nice to check at least gcc 4.3 (or better current trunk).
Comment 2 Gunnar von Boehn 2008-05-28 16:28:29 UTC
(In reply to comment #1)
> It would have been nice to check at least gcc 4.3 (or better current trunk).
> 

I have verified this with the most current GCC source trunk.
GCC 4.4 code snapshot 2008-05-23

The problem is still persistant.
GCC sets up his work registers inside the work loop.



write_32x4:
        link.w %fp,#0
        move.l 16(%fp),%d0
        move.l 8(%fp),%a0
        lsr.l #4,%d0
        jra .L50
.L51:
        moveq #1,%d1
        move.l %d1,(%a0)
        move.l %d1,4(%a0)
        move.l %d1,8(%a0)
        move.l %d1,12(%a0)
        lea (16,%a0),%a0
        subq.l #1,%d0
.L50:
        tst.l %d0
        jne .L51
        unlk %fp
        rts