36135 – GCC creates suboptimal ASM : suboptimal Adressing-Modes used

Bug 36135 - GCC creates suboptimal ASM : suboptimal Adressing-Modes used

Summary: GCC creates suboptimal ASM : suboptimal Adressing-Modes used

Status:	RESOLVED DUPLICATE of bug 31849

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.2.1

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2008-05-05 13:54 UTC by Gunnar von Boehn
Modified:	2008-06-13 14:34 UTC (History)
CC List:	15 users (show)

See Also:
Host:	i686-pc-linux-gnu
Target:	m68k-linux-gnu
Build:	i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gunnar von Boehn 2008-05-05 13:54:39 UTC

+++ This bug was initially created as a clone of Bug #36133 +++

Hello,

The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2,
the generated code got partially much worse with GCC 4.


One problem that was visible a lot was that GCC uses suboptimal addressing modes.

Please see the below example for details.
In line 14 to line 2E this code was created:
14:       2290            movel %a0@,%a1@
16:       2368 0004 0004  movel %a0@(4),%a1@(4)
1c:       2368 0008 0008  movel %a0@(8),%a1@(8)
22:       2368 000c 000c  movel %a0@(12),%a1@(12)
28:       d3fc 0000 0010  addal #16,%a1
2e:       d1fc 0000 0010  addal #16,%a0

Much shorter and more efficient would have been this:
14:       20d9            movel %a1@+,%a0@+
16:       20d9            movel %a1@+,%a0@+
18:       20d9            movel %a1@+,%a0@+
1a:       20d9            movel %a1@+,%a0@+


Example: C-source
Code:
void * copy_32x4a(void *destparam, const void *srcparam, size_t size)
{
        int *dest = destparam;
        const int *src = srcparam;
        int size32;
        size32 = size / 16;
        for (; size32; size32--) {
                *dest++ = *src++;
                *dest++ = *src++;
                *dest++ = *src++;
                *dest++ = *src++;
        }
}

Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c

Code generated by GCC 4.2:
04:       202f 000c       movel %sp@(12),%d0
08:       226f 0004       moveal %sp@(4),%a1
0c:       206f 0008       moveal %sp@(8),%a0
10:       e888            lsrl #4,%d0
12:       6022            bras 36
14:       2290            movel %a0@,%a1@
16:       2368 0004 0004  movel %a0@(4),%a1@(4)
1c:       2368 0008 0008  movel %a0@(8),%a1@(8)
22:       2368 000c 000c  movel %a0@(12),%a1@(12)
28:       d3fc 0000 0010  addal #16,%a1
2e:       d1fc 0000 0010  addal #16,%a0
34:       5380            subql #1,%d0
36:       4a80            tstl %d0
38:       66da            bnes 14
3a:       4e75            rts    

For comparison here is code that you would expect:
04:       202f 000c       movel %sp@(12),%d0
08:       226f 0004       moveal %sp@(4),%a1
0c:       206f 0008       moveal %sp@(8),%a0
10:       e888            lsrl #4,%d0
12:       6022            beq 20
14:       20d9            movel %a1@+,%a0@+
16:       20d9            movel %a1@+,%a0@+
18:       20d9            movel %a1@+,%a0@+
1a:       20d9            movel %a1@+,%a0@+
1c:       5380            subql #1,%d0
1e:       66da            bnes 14
20:       4e75            rts 

Compiler used:
m68k-linux-gnu-gcc -v
Using built-in specs.
Target: m68k-linux-gnu
Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin
Thread model: posix
gcc version 4.2.1 (Sourcery G++ Lite 4.2-47)


I hope that this report help you to improve the quality of GCC.

Kind regards

Gunnar von Boehn
--
P.S. I put the noticed issues in individual tickets for easier tracking. I hope that this is helpful to you.

Comment 1 Richard Biener 2008-05-07 19:33:04 UTC

It would have been nice to check at least gcc 4.3 (or better current trunk).

Comment 2 Gunnar von Boehn 2008-05-28 16:23:19 UTC

(In reply to comment #1)
> It would have been nice to check at least gcc 4.3 (or better current trunk).
> 

I have verified this for you with the most current GCC source.
Verified with gcc version 4.4.0 20080523 (experimental) (GCC) 

The problem that GCC uses bad addressing modes is still persistent.

Code generated by GCC 4.4 
copy_32x4:
        link.w %fp,#-12
        movem.l #3076,(%sp)
        move.l 16(%fp),%d2
        lsr.l #4,%d2
        move.l 8(%fp),%a3
        move.l 12(%fp),%a2
        jra .L6
.L7:
        move.l (%a2),%a1
        subq.l #1,%d2
        move.l 4(%a2),%d0
        move.l 8(%a2),%d1
        move.l 12(%a2),%a0
        add.l #16,%a2
        move.l %a1,(%a3)
        move.l %d0,4(%a3)
        move.l %d1,8(%a3)
        move.l %a0,12(%a3)
        add.l #16,%a3
.L6:
        tst.l %d2
        jne .L7
        movem.l (%sp),#3076
        unlk %fp
        rts

Comment 3 Gunnar von Boehn 2008-06-12 14:34:41 UTC

Andreas,

What is your opinion to this?

GCC 2.9 used to combine the move with increment in the combine step to something like this:
***
(insn 32 30 33 (set (reg/v:SI 32)
        (mem:SI (post_inc:SI (reg/v:SI 34)) 0)) 42 {movsi+1} (nil)
    (expr_list:REG_INC (reg/v:SI 34)
        (nil)))
***


So problem is that now GCC seems not to be able to do this anymore by itself
With GCC 4.4 the output is:
**
(insn 34 33 35 4 example2.c:11 (set (reg/v:SI 54 [ value ])
        (mem:SI (reg/v/f:SI 52 [ src ]) [2 S4 A16])) 37 {*movsi_cf} (nil))

(insn 35 34 36 4 example2.c:12 (set (reg/v:SI 53 [ value2 ])
        (mem:SI (plus:SI (reg/v/f:SI 52 [ src ])
                (const_int 4 [0x4])) [2 S4 A16])) 37 {*movsi_cf} (nil))

(insn 36 35 38 4 example2.c:5 (set (reg/v/f:SI 52 [ src ])
        (plus:SI (reg/v/f:SI 52 [ src ])
            (const_int 8 [0x8]))) 133 {*addsi3_5200} (nil))

(insn 38 36 40 4 example2.c:10 (set (reg/v:SI 50 [ size.21 ])
        (plus:SI (reg/v:SI 50 [ size.21 ])
            (const_int -1 [0xffffffff]))) 133 {*addsi3_5200} (nil))
***

Any ideas about this?


Kind regards

Gunnar von Boehn

Comment 4 Andrew Pinski 2008-06-12 14:38:08 UTC

This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all.  There is another bug about this somewhere I think for arm.  PowerPC has the same issue too ...

Comment 5 Gunnar von Boehn 2008-06-13 09:31:32 UTC

(In reply to comment #4)
> This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. 
> There is another bug about this somewhere I think for arm.  PowerPC has the
> same issue too ...
> 

If this effects so many platforms this sounds like an important issue to me.
Maybe someone should increase the priority and severity of the issue in this case?

Andrew, do you plan to fix this issue?

Cheers
Gunnar

Comment 6 Gunnar von Boehn 2008-06-13 13:34:31 UTC

(In reply to comment #4)
> This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. 
> There is another bug about this somewhere I think for arm.  PowerPC has the
> same issue too ...
> 

Hi Andrew,

I want to make clear that the 68K backend used to be able to do this optimization in the GCC 2.9 times. Later with 3.4 or 4.x this optmization did not work anymore and the code became worth.
Does this make sense in your opinion?


Cheers

Comment 7 Andrew Pinski 2008-06-13 14:34:11 UTC

>Andrew, do you plan to fix this issue?

Personally no.  Mostly because IV-opts is hard to understand.

Also it is not the m68k back-end doing the optimization rather loop.c did it.


See PR 31849.

*** This bug has been marked as a duplicate of 31849 ***