This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug inline-asm/40819] New: 32x32->64 - asm code from "longlong.h" is not inlined for 68060's builds
- From: "ami_stuff at o2 dot pl" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 21 Jul 2009 18:22:17 -0000
- Subject: [Bug inline-asm/40819] New: 32x32->64 - asm code from "longlong.h" is not inlined for 68060's builds
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Hi,
There is an asm code for 32x32->64 function in the "longlong.h" file which can
be used for 68060 CPU:
#define umul_ppmm(xh, xl, a, b) \
__asm__ ("| Inlined umul_ppmm\n" \
" move%.l %2,%/d0\n" \
" move%.l %3,%/d1\n" \
" move%.l %/d0,%/d2\n" \
" swap %/d0\n" \
" move%.l %/d1,%/d3\n" \
" swap %/d1\n" \
" move%.w %/d2,%/d4\n" \
" mulu %/d3,%/d4\n" \
" mulu %/d1,%/d2\n" \
" mulu %/d0,%/d3\n" \
" mulu %/d0,%/d1\n" \
" move%.l %/d4,%/d0\n" \
" eor%.w %/d0,%/d0\n" \
" swap %/d0\n" \
" add%.l %/d0,%/d2\n" \
" add%.l %/d3,%/d2\n" \
" jcc 1f\n" \
" add%.l %#65536,%/d1\n" \
"1: swap %/d2\n" \
" moveq %#0,%/d0\n" \
" move%.w %/d2,%/d0\n" \
" move%.w %/d4,%/d2\n" \
" move%.l %/d2,%1\n" \
" add%.l %/d1,%/d0\n" \
" move%.l %/d0,%0" \
: "=g" ((USItype) (xh)), \
"=g" ((USItype) (xl)) \
: "g" ((USItype) (a)), \
"g" ((USItype) (b)) \
: "d0", "d1", "d2", "d3", "d4")
but it looks like in some cases(?) this asm inline is not used, so as a result
binaries generated for 68060 CPU (which heavy use 32x32->64 functions) are
about 40% slower (FFmpeg's MP3 decoder).
Here is generic C code:
#include <stdint.h>
inline int MULH(int a, int b){
return ((int64_t)(a) * (int64_t)(b))>>32;
}
Here is asm output from GCC 4.4.0 (-m68060 -O3 -fomit-frame-pointer):
#NO_APP
.text
.even
.globl _MULH
_MULH:
move.l d3,-(sp)
move.l d2,-(sp)
move.l 12(sp),d1
smi d0
extb.l d0
move.l 16(sp),d3
smi d2
extb.l d2
move.l d2,a0
move.l d3,a1
move.l a1,-(sp)
move.l a0,-(sp)
move.l d1,-(sp)
move.l d0,-(sp)
jsr ___muldi3
lea (16,sp),sp
move.l d0,d1
smi d0
extb.l d0
move.l d1,d0
move.l (sp)+,d2
move.l (sp)+,d3
rts
The same problem happens for functions defined like this:
#define MULL(a,b,s) (((int64_t)(a) * (int64_t)(b)) >> (s))
#define MUL64(a,b) ((int64_t)(a) * (int64_t)(b))
#define MAC64(d, a, b) ((d) += MUL64(a, b))
#define MLS64(d, a, b) ((d) -= MUL64(a, b))
Regards
--
Summary: 32x32->64 - asm code from "longlong.h" is not inlined
for 68060's builds
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ami_stuff at o2 dot pl
GCC host triplet: i686-cygwin
GCC target triplet: m68k-amigaos
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40819