Inline asm for ARM

Andrew Haley
Wed Jun 16 16:11:00 GMT 2010

On 06/16/2010 11:23 AM, Pavel Pavlov wrote:
> I spent hours to get it working properly, but it seems that I can't find a way to do it properly.
> In arm 5te, there is an instruction SMLALBB
> SMLALBB RdLo, RdHi, Rm, Rs
> Multiples bottom 16 bits of Rm by bottom 16 bits of Rs and adds 32 bit result to 64 bit integer represented by a pair of register RdLo, RdHi.
> So, I tried everything I can and it seems that I can't get it working.
> The closest try was:
> static __inline void smlalbb(int * lo, int * hi, int x, int y)
> {
> 	__asm__ __volatile__("smlalbb %0, %1, %2, %3" : "=&r"(lo), "=&r"(hi) : "r"(x), "r"(y), "0"(lo), "1"(hi));
> }
> It seem to produce correct result, but that worked only for simple test function, if I chained calls to this smlalbb function the results weren't correct anymore.
> The correct way would probably have to use (*lo) and (*hi) as part of register lists, but in that case it adds too many useless loads and stores (instead of translating directly to a single asm instruction it would generate like 8-10 instructions).

I think it should be

inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi)
    uint64_t ll;
    unsigned int l;
    unsigned int h;
  } retval;

  retval.ll = acc;

  __asm__("smlalbb %0, %1, %2, %3"
	  : "+r"(retval.l), "+r"(retval.h)
	  : "r"(lo), "r"(hi));

  return retval.ll;

uint64_t smlalXX64 (uint64_t i, unsigned int a, unsigned int b)
  uint64_t tmp = i;

  tmp = smlalbb(tmp, a, b);
  tmp = smlalbt(tmp, a, b);
  tmp = smlaltb(tmp, a, b);
  tmp = smlaltt(tmp, a, b);

  return tmp;


More information about the Gcc-help mailing list