Inline asm for ARM
Andrew Haley
aph@redhat.com
Wed Jun 16 22:19:00 GMT 2010
On 06/16/2010 06:12 PM, Pavel Pavlov wrote:
>> From: gcc-help-owner@gcc.gnu.org [mailto:gcc-help-owner@gcc.gnu.org] On
>> Behalf Of Andrew Haley
>>
>>> By the way, the version that takes hi:lo for the first int64 works fine:
>>>
>>> static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if
>>> defined(__CC_ARM)
>>> __asm { smlalbb *lo, *hi, x, y; }
>>> #elif defined(__GNUC__)
>>> __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi)
>>> : "r"(x), "r"(y)); #endif }
>>>
>>>
>>> void test_smlalXX(int hi, int lo, int a, int b) {
>>> smlalbb(&hi, &lo, a, b);
>>> smlalbt(&hi, &lo, a, b);
>>> smlaltb(&hi, &lo, a, b);
>>> smlaltt(&hi, &lo, a, b);
>>> }
>>>
>>> Translates directly into four asm opcodes
>>
>> Mmmm, but the volatile is wrong. If you need volatile to stop gcc
>> from deleting your asm, you have a mistake somewhere.
>
> I had to add volatile when I had that mess with "=&r" and "0", now I
> think it might be removed.
> Just tested, and I still need that. The reason I needed that was
> because my test function was a noop:
> void test_smlalXX(int lo, int hi, int a, int b)
> {
> smlalbb(&lo, &hi, a, b);
> smlalbt(&lo, &hi, a, b);
> smlaltb(&lo, &hi, a, b);
> smlaltt(&lo, &hi, a, b);
> }
> Gcc correctly guesses that there is no side effect from that
> function if I don't use volatile. So, I removed volatile and added
> return for that function:
>
> uint64_t test_smlalXX(int lo, int hi, int a, int b)
> {
> smlalbb(&lo, &hi, a, b);
> smlalbt(&lo, &hi, a, b);
> smlaltb(&lo, &hi, a, b);
> smlaltt(&lo, &hi, a, b);
>
> T64 retval;
>
> retval.s.hi = hi;
> retval.s.lo = lo;
> return retval.i64;
> }
>
> The output becomes:
> 000000e4 <_Z12test_smlalXXiiii>:
> e4: e92d0030 push {r4, r5}
> e8: e1410382 smlalbb r0, r1, r2, r3
> ec: e14103c2 smlalbt r0, r1, r2, r3
> f0: e14103a2 smlaltb r0, r1, r2, r3
> f4: e1a05001 mov r5, r1
> f8: e14503e2 smlaltt r0, r5, r2, r3
> fc: e1a04000 mov r4, r0
> 100: e1a01005 mov r1, r5
> 104: e8bd0030 pop {r4, r5}
> 108: e12fff1e bx lr
>
> Basically gcc, gets confused about return variable and generates
> useless gunk at the end for the last function. I tried to comment
> smlaltt(&lo, &hi, a, b); in the test_smlalXX, and gcc still
> generates that same useless code around smlattb
I have seen something similar with higher optimization levels, where
some pass messes things up a bit. Your
mov r4, r0
is very weird, though. I can't explain that.
-O1 generates perfect code for me, though.
Andrew.
More information about the Gcc-help
mailing list