Inline asm for ARM

Wed Jun 16 20:44:00 GMT 2010

> -----Original Message-----
> Behalf Of Andrew Haley
> Sent: Wednesday, June 16, 2010 13:00
> To: gcc-help@gcc.gnu.org
> Subject: Re: Inline asm for ARM
> 
> On 06/16/2010 05:57 PM, Pavel Pavlov wrote:
> >
> >
> >> -----Original Message-----
> >> From: gcc-help-owner@gcc.gnu.org [mailto:gcc-help-owner@gcc.gnu.org]
> >> On Behalf Of Andrew Haley
> >> Sent: Wednesday, June 16, 2010 12:52
> >> To: gcc-help@gcc.gnu.org
> >> Subject: Re: Inline asm for ARM
> >>
> >> On 06/16/2010 05:40 PM, Pavel Pavlov wrote:
> >>>> -----Original Message-----
> >>>> From: Andrew Haley [mailto:aph@redhat.com] On 06/16/2010 05:11 PM,
> >>>> Pavel Pavlov wrote:
> >>>>>> -----Original Message-----
> >>>>>> On 06/16/2010 01:15 PM, Andrew Haley wrote:
> >>>>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote:
> >>>>> ...
> >>>>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) {
> >>>>>>   union
> >>>>>>   {
> >>>>>>     uint64_t ll;
> >>>>>>     struct
> >>>>>>     {
> >>>>>>       unsigned int l;
> >>>>>>       unsigned int h;
> >>>>>>     } s;
> >>>>>>   } retval;
> >>>>>>
> >>>>>>   retval.ll = acc;
> >>>>>>
> >>>>>>   __asm__("smlalbb %0, %1, %2, %3"
> >>>>>> 	  : "+r"(retval.s.l), "+r"(retval.s.h)
> >>>>>> 	  : "r"(lo), "r"(hi));
> >>>>>>
> >>>>>>   return retval.ll;
> >>>>>> }
> >>>>>>
> >>>>>
> >>>>> [Pavel Pavlov]
> >>>>> Later on I found out that I had to use +r constraint, but then,
> >>>>> when I use that
> >>>> function for example like that:
> >>>>> int64_t rsmlalbb64(int64_t i, int x, int y) {
> >>>>> 	return smlalbb64(i, x, y);
> >>>>> }
> >>>>>
> >>>>> Gcc generates this asm:
> >>>>> <rsmlalbb64>:
> >>>>> push	{r4, r5}
> >>>>> mov	r4, r0
> >>>>> mov	ip, r1
> >>>>> smlalbb	r4, ip, r2, r3
> >>>>> mov	r5, ip
> >>>>> mov	r0, r4
> >>>>> mov	r1, ip
> >>>>> pop	{r4, r5}
> >>>>> bx	lr
> >>>>>
> >>>>> It's bizarre what gcc is doing in that function, I understand if
> >>>>> it can't optimize and correctly use r0 and r1 directly, but from
> >>>>> that listing it looks as if gcc got drunk and decided to touch r5
> >>>>> for absolutely no reason!
> >>>>>
> >>>>> the expected out should have been like that:
> >>>>> <rsmlalbb64>:
> >>>>> smlalbb	r0, r1, r2, r3
> >>>>> bx	lr
> >>>>>
> >>>>> I'm using cegcc 4.1.0 and I compile with
> >>>>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o
> >>>>> arm-mingw32ce-g++ ARM_TEST_GCC.obj
> >>>>>
> >>>>> Is there a way to access individual parts of that 64-bit input
> >>>>> integer or, is there a way to specify that two 32-bit integers
> >>>>> should be treated as a Hi:Lo parts of 64 bit variable. It's
> >>>>> commonly done with a temporary, but the result is that gcc generates to
> much junk.
> >>>>
> >>>> Why don't you just use the function I sent above?  It generates
> >>>>
> >>>> smlalbb:
> >>>> 	smlalbb r0, r1, r2, r3
> >>>> 	mov	pc, lr
> >>>>
> >>>> smlalXX64:
> >>>> 	smlalbb r0, r1, r2, r3
> >>>> 	smlalbt r0, r1, r2, r3
> >>>> 	smlaltb r0, r1, r2, r3
> >>>> 	smlaltt r0, r1, r2, r3
> >>>> 	mov	pc, lr
> >>>>
> >>>
> >>> [Pavel Pavlov]
> >>> What's your gcc -v? The output I posted comes from your function.
> >>
> >> 4.3.0
> >>
> >> Perhaps your compiler options were wrong?  Dunno.
> >>
> >
> >
> >  [Pavel Pavlov]
> > It's kind of difficult ot get that part wrong :)
> 
> It's not.  Trust me, I have been on gcc-help for _long_ while...
> 
> I've even seen complains about poor code when optimization is disabled.
> 
> Andrew.
> 
> 
> Andrew.
> 
> 
> 
>  I saw that there are some changes between 4.1.0 and 4.3.0 in arm code,
> optimizer code might have been improved between the two versions as well. So,
> I'm building 4.4.0 now to see if it fixes the problem.

[Pavel Pavlov] 
Well, off course I enable optimization. -O3 I suppose is enough for this simple case. That's why I said that it's difficult to get that wrong. Without optimizations it would generate something quite different (without inlining etc)