ARM Cortex M0/M0+/M1 - substituting LDR with a MOV rd,#<8 bit immediate> & a logical or arithmetical instruction (LSLS/RORS/ADDS/SUBS/MULS/NEGS/)

Sean Dunlevy seandunlevy@hotmail.com
Sat Dec 29 21:29:00 GMT 2018


Hi,
     I have swapped a number of E-mails with Joseph Yiu (ARM - also wrote 'The Definitive Guide to ARM' series of books). We discovered that 10s of 1000s of 32-bit values can be set up in the same number of cycles and in 32 bits rather than 48 bits so it's quite important when small-footprint is needed.

MOV Rd,#<imm> - Read 8 bit value



LSLS Rd,#<imm> - this allows 23x256 immediates to be put into a register in 2 cycles & 32 bits.

ADDS Rd,#<imm> - this allows the values 256-510 to be put into a register in 2 cycles & 32 bits.

ADDS/SUBS sd,ip   - this sets up 511 PC-relative values & 511  SP-relative values into a register in 2 cycles & 32 bits.

SUBS Rd,#<imm> - this allows values -1 to -255 to be put into a register in 2 cycles & 32 bits.

MULS Rd,Rd        - this allows 255 square numbers to be put into a register in 2 cycles & 32 bits.

NEGS Rd,Rd        - this allows 256 negative numbers to be pit into a register in 2 cycles & 32 bits.

RORS Rd,Rd        - this actually behaves in a slightly unexpected manner. Since the shift [5:7] should have no effect but the CPU status after the instruction is different if one of one of the bits in that field are set.


I'm afraid I don't understand new compiler technology. I rewrote the felide constructors for the GNU SH-2, SH-4 & ARM3 C compilers back in the mid 90s but it appears that this way of using snippets of code is no longer used. I brought it up with Joseph Yiu (ARM senior staff and author of the 'Definitive Guide to Programming the ARM' series of books. ARM have looked into this technique and agree (obviously) that it is an optimization but it's people in Texas that are looking at it. I think that the ADDS/SUBS (-255 to + 510) is going to be the most common usage but I don't know if that messes up the compilers flag management. I suppose the LSLS set may also be of some use given that powers of 2 are common in many routines. The rest are more tricky and not likely to show the same improvement but it just appears (to me) to be a quick optimization. If I knew how GNU worked, I would have done it for you. As it is, I'm writing in 100% assembly language and I do frequently use the values I know are already in the registers to speed things up.

Many thanks,
Sean



More information about the Gcc-help mailing list