The following code: __thread long tl = 42; long f(void) { long *l = &tl; register long r0 __asm__ ("r0"); register long *r1 __asm__ ("r1"); r0 = 23; r1 = l; __asm__ __volatile__ ("": "+r"(r0) : "r"(r1)); return r0; } When compiled with -O2, gives the followin assembly code: f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 ldr r1, .L3 str lr, [sp, #-4]! bl __aeabi_read_tp @ load_tp_soft add r1, r0, r1 ldr pc, [sp], #4 Where we see that the r0 register is never loaded with 23 as it should.
I believe this is a bug in the way we expand local reg vars. The manual says: Local register variables in specific registers do not reserve the registers, except at the point where they are used as input or output operands in an @code{asm} statement and the @code{asm} statement itself is not deleted. The compiler's data flow analysis is capable of determining where the specified registers contain live values, and where they are available for other uses. There are two key points to note in the above: 1) The only point at which a register variable *has* to be in the named register is when an inline ASM appears. 2) Data flow is supposed to know when the value is live. I thus believe we need to expand local vars as used in this test-case by copying a pseudo reg that contains the real value into the required register immediately before its use in an ASM -- and to leave optimizing this code path to the register allocator -- so that ideally no copy is necessary. In the test-case cited, the user assigns the variable r0 with a value and then tries to assign another value to the variable r1. The second step requires a libcall sequence that clobbers the value previously stored into r0 -- to avoid this happening the value previously assigned must be copied to a call-saved register (or the assignment deferred until after the libcall).
Copying a pseudo to the required register before an __asm__ is not so easy, because at expand time the data flow engine doesn't know anything, it's not even initialized. What you could do, I suppose in cfgexpand.c * before expand_used_vars(), scan the cfun->local_decl list and make a separate list of local register values, say a VEC(tree,heap) local_reg_var_decls * expand local register vars as usual, assigning pseudos to the local var * every time before expanding __asm__, walk the local_reg_var_decls list and shink-wrap the __asm__ with moves from/to the location where the var lives (must be a pseudo, I suppose) to/from the required register