The code, by Kent Dickey posted to comp.arch typedef unsigned int u32; typedef unsigned long long u64; u64 do_op(u64 out0, u64 in0, u64 in1, u32 opcode, int size); void calc_loop(u64 *optr, u64 *iptr0, u64 *iptr1, u32 opcode, int size, int len) { u64 o0, i0, i1, val, result; int num, shift, pos; int i, j; // size is 0,1,2,3 representing 8,16,32,64 bytes num = 8 >> size; // 8,4,2,1 shift = 8 << size; // 8,16,32,64 for(i = 0; i < len; i++) { o0 = optr[i]; i0 = iptr0[i]; i1 = iptr1[i]; result = 0; pos = 0; for(j = 0; j < num; j++) { val = do_op(o0, i0, i1, opcode, size); result = result | (val << pos); pos += shift; if(shift < 64) { o0 = o0 >> shift; i0 = i0 >> shift; i1 = i1 >> shift; } } optr[i] = result; } } compiled for aarch64 on godbolt with recent trunk and -O3 -fomit-frame-pointer (see https://godbolt.org/z/5bKPeGWrK ) does not set up the frame pointer, but it also does not use it for aoviding spill/restore pairs: calc_loop: stp x19, x20, [sp, -144]! mov w6, 8 asr w19, w6, w4 stp x27, x28, [sp, 64] lsl w27, w6, w4 str x30, [sp, 80] stp x0, x1, [sp, 112] str x2, [sp, 128] cmp w5, 0 ble .L1 sbfiz x0, x5, 3, 32 stp x21, x22, [sp, 16] mov w20, w4 stp x23, x24, [sp, 32] mov w21, w3 stp x25, x26, [sp, 48] str x0, [sp, 136] cmp w27, 63 ble .L3 mov x25, 0 .L6: ldr x0, [sp, 112] ldr x23, [x0, x25] ldr x0, [sp, 120] ldr x0, [x0, x25] str x0, [sp, 104] ldr x0, [sp, 128] ldr x24, [x0, x25] cbz w19, .L10 mov w22, 0 mov w28, 0 mov x26, 0 .L5: ldr x1, [sp, 104] mov w4, w20 mov w3, w21 mov x2, x24 mov x0, x23 add w22, w22, 1 bl do_op lsl x0, x0, x28 add w28, w28, w27 orr x26, x26, x0 cmp w19, w22 bne .L5 ldr x0, [sp, 112] str x26, [x0, x25] add x25, x25, 8 ldr x0, [sp, 136] cmp x0, x25 bne .L6 .L17: ldp x21, x22, [sp, 16] ldp x23, x24, [sp, 32] ldp x25, x26, [sp, 48] .L1: ldp x27, x28, [sp, 64] ldr x30, [sp, 80] ldp x19, x20, [sp], 144 ret .L3: str xzr, [sp, 104] ldp x0, x1, [sp, 104] ldr x24, [x1, x0] ldr x1, [sp, 120] ldr x25, [x1, x0] ldr x1, [sp, 128] ldr x22, [x1, x0] cbz w19, .L11 .L20: mov w26, 0 mov w28, 0 mov x23, 0 .L8: mov x2, x22 mov x1, x25 mov x0, x24 mov w4, w20 mov w3, w21 add w26, w26, 1 bl do_op lsr x24, x24, x27 lsl x0, x0, x28 add w28, w28, w27 orr x23, x23, x0 lsr x25, x25, x27 lsr x22, x22, x27 cmp w19, w26 bne .L8 ldp x0, x1, [sp, 104] str x23, [x1, x0] add x0, x0, 8 ldr x1, [sp, 136] str x0, [sp, 104] cmp x1, x0 beq .L17 .L19: ldp x0, x1, [sp, 104] ldr x24, [x1, x0] ldr x1, [sp, 120] ldr x25, [x1, x0] ldr x1, [sp, 128] ldr x22, [x1, x0] cbnz w19, .L20 .L11: ldp x0, x1, [sp, 104] mov x23, 0 str x23, [x1, x0] add x0, x0, 8 ldr x1, [sp, 136] str x0, [sp, 104] cmp x1, x0 bne .L19 b .L17 .L10: ldr x0, [sp, 112] mov x26, 0 str x26, [x0, x25] add x25, x25, 8 ldr x0, [sp, 136] cmp x0, x25 bne .L6 b .L17
#define FIXED_REGISTERS \ { \ 0, 0, 0, 0, 0, 0, 0, 0, /* R0 - R7 */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* R8 - R15 */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* R16 - R23 */ \ 0, 0, 0, 0, 0, 1, 0, 1, /* R24 - R30, SP */ \ It is specifically marked as a fixed register ...
See https://gcc.gnu.org/pipermail/gcc-patches/2016-September/456662.html I think this is by design of the ABI ...
(In reply to Andrew Pinski from comment #2) > See https://gcc.gnu.org/pipermail/gcc-patches/2016-September/456662.html > > I think this is by design of the ABI ... The workaround mentioned in the thread does not appear to work, -O3 -fomit-frame-pointer -fcall-used-x29 yields cc1: error: cannot use 'x29' as a call-used register
I think the ABI might have softened on this point at https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#general-purpose-registers. It says the platform can let x29 be used as a general-purpose callee-saved register in section 6.4.6.
This was a deliberate design choice. Although the frame chain is not set up by code that omits the frame pointer, the chain of frames that are set up by other functions is still valid this way. This ensures that any code that does try to walk the frame chain will not crash. If we reused the frame pointer for other purposes, then any code trying to walk the frame chain (eg backtrace()) would encounter an invalid record and likely crash. With 31 main registers, the benefit from one additional one is not especially large.
For completeness. The AArch64 ABI lists 4 alternatives with respect to having a frame chain. When -fomit-frame-pointer is used, GCC implements this one: - It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record
(In reply to Richard Earnshaw from comment #5) > This was a deliberate design choice. Although the frame chain is not set up > by code that omits the frame pointer, the chain of frames that are set up by > other functions is still valid this way. This ensures that any code that > does try to walk the frame chain will not crash. If we reused the frame > pointer for other purposes, then any code trying to walk the frame chain (eg > backtrace()) would encounter an invalid record and likely crash. Would it make sense to document this somewhere? Or did I just miss it? :-)
(In reply to Thomas Koenig from comment #7) > Would it make sense to document this somewhere? Or did I just miss it? :-) Possibly, but I've no idea where. It's too target-specific to put under the generic documentation for -fomit-frame-pointer and I don't think there's a section in the manual that really documents the target-specific behaviours of generic options.
(In reply to Richard Earnshaw from comment #8) > (In reply to Thomas Koenig from comment #7) > > Would it make sense to document this somewhere? Or did I just miss it? :-) > > Possibly, but I've no idea where. It's too target-specific to put under the > generic documentation for -fomit-frame-pointer and I don't think there's a > section in the manual that really documents the target-specific behaviours > of generic options. Hm, maybe a chapter "Architecture-specific implementation choices" to document those cases where the ABI gives some leeway could be a place to put it. It could have sections on architecture.