Inline asm for ARM

Andrew Haley aph@redhat.com
Wed Jun 16 17:14:00 GMT 2010


On 06/16/2010 05:54 PM, Pavel Pavlov wrote:
>> -----Original Message-----
>> Behalf Of Pavel Pavlov
>> Sent: Wednesday, June 16, 2010 12:40
>> To: Andrew Haley
>> Cc: gcc-help@gcc.gnu.org
>> Subject: RE: Inline asm for ARM
>>
>>> -----Original Message-----
>>> From: Andrew Haley [mailto:aph@redhat.com] On 06/16/2010 05:11 PM,
>>> Pavel Pavlov wrote:
>>>>> -----Original Message-----
>>>>> On 06/16/2010 01:15 PM, Andrew Haley wrote:
>>>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote:
>>>> ...
>>>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) {
>>>>>   union
>>>>>   {
>>>>>     uint64_t ll;
>>>>>     struct
>>>>>     {
>>>>>       unsigned int l;
>>>>>       unsigned int h;
>>>>>     } s;
>>>>>   } retval;
>>>>>
>>>>>   retval.ll = acc;
>>>>>
>>>>>   __asm__("smlalbb %0, %1, %2, %3"
>>>>> 	  : "+r"(retval.s.l), "+r"(retval.s.h)
>>>>> 	  : "r"(lo), "r"(hi));
>>>>>
>>>>>   return retval.ll;
>>>>> }
>>>>>
>>>>
>>>> [Pavel Pavlov]
>>>> Later on I found out that I had to use +r constraint, but then, when
>>>> I use that
>>> function for example like that:
>>>> int64_t rsmlalbb64(int64_t i, int x, int y) {
>>>> 	return smlalbb64(i, x, y);
>>>> }
>>>>
>>>> Gcc generates this asm:
>>>> <rsmlalbb64>:
>>>> push	{r4, r5}
>>>> mov	r4, r0
>>>> mov	ip, r1
>>>> smlalbb	r4, ip, r2, r3
>>>> mov	r5, ip
>>>> mov	r0, r4
>>>> mov	r1, ip
>>>> pop	{r4, r5}
>>>> bx	lr
>>>>
>>>> It's bizarre what gcc is doing in that function, I understand if it
>>>> can't optimize and correctly use r0 and r1 directly, but from that
>>>> listing it looks as if gcc got drunk and decided to touch r5 for
>>>> absolutely no reason!
>>>>
>>>> the expected out should have been like that:
>>>> <rsmlalbb64>:
>>>> smlalbb	r0, r1, r2, r3
>>>> bx	lr
>>>>
>>>> I'm using cegcc 4.1.0 and I compile with
>>>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o
>>>> arm-mingw32ce-g++ ARM_TEST_GCC.obj
>>>>
>>>> Is there a way to access individual parts of that 64-bit input
>>>> integer or, is there a way to specify that two 32-bit integers
>>>> should be treated as a Hi:Lo parts of 64 bit variable. It's commonly
>>>> done with a temporary, but the result is that gcc generates to much junk.
>>>
>>> Why don't you just use the function I sent above?  It generates
>>>
>>> smlalbb:
>>> 	smlalbb r0, r1, r2, r3
>>> 	mov	pc, lr
>>>
>>> smlalXX64:
>>> 	smlalbb r0, r1, r2, r3
>>> 	smlalbt r0, r1, r2, r3
>>> 	smlaltb r0, r1, r2, r3
>>> 	smlaltt r0, r1, r2, r3
>>> 	mov	pc, lr
>>>
>>
>> [Pavel Pavlov]
>> What's your gcc -v? The output I posted comes from your function.
> 
> By the way, the version that takes hi:lo for the first int64 works fine:
> 
> static __inline void smlalbb(int * lo, int * hi, int x, int y)
> {
> #if defined(__CC_ARM)
> 	__asm { smlalbb *lo, *hi, x, y; }
> #elif defined(__GNUC__)
> 	__asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi) : "r"(x), "r"(y));
> #endif
> }
> 
>  
> void test_smlalXX(int hi, int lo, int a, int b)
> {
> 	smlalbb(&hi, &lo, a, b);
> 	smlalbt(&hi, &lo, a, b);
> 	smlaltb(&hi, &lo, a, b);
> 	smlaltt(&hi, &lo, a, b);
> }
> 
> Translates directly into four asm opcodes

Mmmm, but the volatile is wrong.  If you need volatile to stop gcc from deleting your
asm, you have a mistake somewhere.

Andrew.



More information about the Gcc-help mailing list