111096 – Frame pointer is not used even when -fomit-frame-pointer is specified

Bug 111096 - Frame pointer is not used even when -fomit-frame-pointer is specified

Summary: Frame pointer is not used even when -fomit-frame-pointer is specified

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	14.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2023-08-22 05:25 UTC by Thomas Koenig
Modified:	2023-08-26 00:28 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas Koenig 2023-08-22 05:25:59 UTC

The code, by Kent Dickey posted to comp.arch

typedef unsigned int u32;
typedef unsigned long long u64;

u64 do_op(u64 out0, u64 in0, u64 in1, u32 opcode, int size);

void
calc_loop(u64 *optr, u64 *iptr0, u64 *iptr1, u32 opcode, int size, int len)
{
        u64     o0, i0, i1, val, result;
        int     num, shift, pos;
        int     i, j;

        // size is 0,1,2,3 representing 8,16,32,64 bytes
        num = 8 >> size;                // 8,4,2,1
        shift = 8 << size;              // 8,16,32,64
        for(i = 0; i < len; i++) {
                o0 = optr[i];
                i0 = iptr0[i];
                i1 = iptr1[i];
                result = 0;
                pos = 0;
                for(j = 0; j < num; j++) {
                        val = do_op(o0, i0, i1, opcode, size);
                        result = result | (val << pos);
                        pos += shift;
                        if(shift < 64) {
                                o0 = o0 >> shift;
                                i0 = i0 >> shift;
                                i1 = i1 >> shift;
                        }
                }
                optr[i] = result;
        }
}

compiled for aarch64 on godbolt with recent trunk and -O3 -fomit-frame-pointer
(see https://godbolt.org/z/5bKPeGWrK ) does not set up the frame pointer,
but it also does not use it for aoviding spill/restore pairs:

calc_loop:
        stp     x19, x20, [sp, -144]!
        mov     w6, 8
        asr     w19, w6, w4
        stp     x27, x28, [sp, 64]
        lsl     w27, w6, w4
        str     x30, [sp, 80]
        stp     x0, x1, [sp, 112]
        str     x2, [sp, 128]
        cmp     w5, 0
        ble     .L1
        sbfiz   x0, x5, 3, 32
        stp     x21, x22, [sp, 16]
        mov     w20, w4
        stp     x23, x24, [sp, 32]
        mov     w21, w3
        stp     x25, x26, [sp, 48]
        str     x0, [sp, 136]
        cmp     w27, 63
        ble     .L3
        mov     x25, 0
.L6:
        ldr     x0, [sp, 112]
        ldr     x23, [x0, x25]
        ldr     x0, [sp, 120]
        ldr     x0, [x0, x25]
        str     x0, [sp, 104]
        ldr     x0, [sp, 128]
        ldr     x24, [x0, x25]
        cbz     w19, .L10
        mov     w22, 0
        mov     w28, 0
        mov     x26, 0
.L5:
        ldr     x1, [sp, 104]
        mov     w4, w20
        mov     w3, w21
        mov     x2, x24
        mov     x0, x23
        add     w22, w22, 1
        bl      do_op
        lsl     x0, x0, x28
        add     w28, w28, w27
        orr     x26, x26, x0
        cmp     w19, w22
        bne     .L5
        ldr     x0, [sp, 112]
        str     x26, [x0, x25]
        add     x25, x25, 8
        ldr     x0, [sp, 136]
        cmp     x0, x25
        bne     .L6
.L17:
        ldp     x21, x22, [sp, 16]
        ldp     x23, x24, [sp, 32]
        ldp     x25, x26, [sp, 48]
.L1:
        ldp     x27, x28, [sp, 64]
        ldr     x30, [sp, 80]
        ldp     x19, x20, [sp], 144
        ret
.L3:
        str     xzr, [sp, 104]
        ldp     x0, x1, [sp, 104]
        ldr     x24, [x1, x0]
        ldr     x1, [sp, 120]
        ldr     x25, [x1, x0]
        ldr     x1, [sp, 128]
        ldr     x22, [x1, x0]
        cbz     w19, .L11
.L20:
        mov     w26, 0
        mov     w28, 0
        mov     x23, 0
.L8:
        mov     x2, x22
        mov     x1, x25
        mov     x0, x24
        mov     w4, w20
        mov     w3, w21
        add     w26, w26, 1
        bl      do_op
        lsr     x24, x24, x27
        lsl     x0, x0, x28
        add     w28, w28, w27
        orr     x23, x23, x0
        lsr     x25, x25, x27
        lsr     x22, x22, x27
        cmp     w19, w26
        bne     .L8
        ldp     x0, x1, [sp, 104]
        str     x23, [x1, x0]
        add     x0, x0, 8
        ldr     x1, [sp, 136]
        str     x0, [sp, 104]
        cmp     x1, x0
        beq     .L17
.L19:
        ldp     x0, x1, [sp, 104]
        ldr     x24, [x1, x0]
        ldr     x1, [sp, 120]
        ldr     x25, [x1, x0]
        ldr     x1, [sp, 128]
        ldr     x22, [x1, x0]
        cbnz    w19, .L20
.L11:
        ldp     x0, x1, [sp, 104]
        mov     x23, 0
        str     x23, [x1, x0]
        add     x0, x0, 8
        ldr     x1, [sp, 136]
        str     x0, [sp, 104]
        cmp     x1, x0
        bne     .L19
        b       .L17
.L10:
        ldr     x0, [sp, 112]
        mov     x26, 0
        str     x26, [x0, x25]
        add     x25, x25, 8
        ldr     x0, [sp, 136]
        cmp     x0, x25
        bne     .L6
        b       .L17

Comment 1 Andrew Pinski 2023-08-22 05:48:10 UTC

#define FIXED_REGISTERS                                 \
  {                                                     \
    0, 0, 0, 0,   0, 0, 0, 0,   /* R0 - R7 */           \
    0, 0, 0, 0,   0, 0, 0, 0,   /* R8 - R15 */          \
    0, 0, 0, 0,   0, 0, 0, 0,   /* R16 - R23 */         \
    0, 0, 0, 0,   0, 1, 0, 1,   /* R24 - R30, SP */     \


It is specifically marked as a fixed register ...

Comment 2 Andrew Pinski 2023-08-22 05:52:08 UTC

See https://gcc.gnu.org/pipermail/gcc-patches/2016-September/456662.html

I think this is by design of the ABI ...

Comment 3 Thomas Koenig 2023-08-22 09:18:28 UTC

(In reply to Andrew Pinski from comment #2)
> See https://gcc.gnu.org/pipermail/gcc-patches/2016-September/456662.html
> 
> I think this is by design of the ABI ...

The workaround mentioned in the thread does not appear to work,
 -O3 -fomit-frame-pointer -fcall-used-x29 yields
cc1: error: cannot use 'x29' as a call-used register

Comment 4 gccbug 2023-08-22 16:49:48 UTC

I think the ABI might have softened on this point at https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#general-purpose-registers.  It says the platform can let x29 be used as a general-purpose callee-saved register in section 6.4.6.

Comment 5 Richard Earnshaw 2023-08-23 10:51:47 UTC

This was a deliberate design choice.  Although the frame chain is not set up by code that omits the frame pointer, the chain of frames that are set up by other functions is still valid this way.  This ensures that any code that does try to walk the frame chain will not crash.  If we reused the frame pointer for other purposes, then any code trying to walk the frame chain (eg backtrace()) would encounter an invalid record and likely crash.

With 31 main registers, the benefit from one additional one is not especially large.

Comment 6 Richard Earnshaw 2023-08-23 15:31:05 UTC

For completeness.

The AArch64 ABI lists 4 alternatives with respect to having a frame chain. When -fomit-frame-pointer is used, GCC implements this one:

- It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record

Comment 7 Thomas Koenig 2023-08-23 20:57:31 UTC

(In reply to Richard Earnshaw from comment #5)
> This was a deliberate design choice.  Although the frame chain is not set up
> by code that omits the frame pointer, the chain of frames that are set up by
> other functions is still valid this way.  This ensures that any code that
> does try to walk the frame chain will not crash.  If we reused the frame
> pointer for other purposes, then any code trying to walk the frame chain (eg
> backtrace()) would encounter an invalid record and likely crash.


Would it make sense to document this somewhere?  Or did I just miss it? :-)

Comment 8 Richard Earnshaw 2023-08-24 14:29:03 UTC

(In reply to Thomas Koenig from comment #7)
> Would it make sense to document this somewhere?  Or did I just miss it? :-)

Possibly, but I've no idea where.  It's too target-specific to put under the generic documentation for -fomit-frame-pointer and I don't think there's a section in the manual that really documents the target-specific behaviours of generic options.

Comment 9 Thomas Koenig 2023-08-25 18:15:27 UTC

(In reply to Richard Earnshaw from comment #8)
> (In reply to Thomas Koenig from comment #7)
> > Would it make sense to document this somewhere?  Or did I just miss it? :-)
> 
> Possibly, but I've no idea where.  It's too target-specific to put under the
> generic documentation for -fomit-frame-pointer and I don't think there's a
> section in the manual that really documents the target-specific behaviours
> of generic options.

Hm, maybe a chapter "Architecture-specific implementation choices"
to document those cases where the ABI gives some leeway could be a
place to put it.  It could have sections on architecture.