Bug 81813 - Inefficient stack pointer adjustment
Summary: Inefficient stack pointer adjustment
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-08-11 02:31 UTC by Josh Poimboeuf
Modified: 2021-08-15 13:37 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail: 8.1.0
Last reconfirmed: 2021-08-15 00:00:00


Attachments
fs_pin.i (129.27 KB, text/plain)
2017-08-11 02:31 UTC, Josh Poimboeuf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Poimboeuf 2017-08-11 02:31:32 UTC
Created attachment 41968 [details]
fs_pin.i

With GCC 7.1, the kernel's object code static analysis tool (objtool) found this unusual method of adjusting the stack pointer in the kernel:

   2cc:   48 8d 4c 24 08          lea    0x8(%rsp),%rcx
   2d1:   48 89 cc                mov    %rcx,%rsp

The value in %rcx was never used afterwards.  It would be faster and more straightforward to just add 8 to %rsp.

To recreate:

  gcc -mno-sse -mpreferred-stack-boundary=3 -O2 -c -o fs_pin.o fs_pin.i
Comment 1 Richard Biener 2017-08-15 10:18:20 UTC
Interesting ;)  A peephole might help.
Comment 2 Andrew Pinski 2021-08-15 09:34:36 UTC
In GCC 8.5 we had:

        pushq   %r12
        .cfi_def_cfa_offset 88
        movl    $2, %ecx
        xorl    %edx, %edx
        xorl    %r9d, %r9d
        xorl    %r8d, %r8d
        xorl    %esi, %esi
        movl    $rcu_lock_map, %edi
        call    lock_acquire
        call    debug_lockdep_rcu_enabled
        movq    32(%rsp), %rax
        leaq    8(%rsp), %rcx
        leaq    32(%rsp), %rdx
        movq    %rcx, %rsp
        .cfi_def_cfa_offset 80
        cmpq    %rax, %rdx
        jne     .L58


In GCC 9.1 (and the trunk) we have:

        pushq   %r13
        .cfi_def_cfa_offset 96
        xorl    %edx, %edx
        xorl    %r9d, %r9d
        xorl    %r8d, %r8d
        movl    $2, %ecx
        xorl    %esi, %esi
        movl    $rcu_lock_map, %edi
        call    lock_acquire
        call    debug_lockdep_rcu_enabled
        movq    32(%rsp), %rax
        popq    %rdx
        .cfi_def_cfa_offset 88
        cmpq    %rax, %rbx
        jne     .L58
Comment 3 Andrew Pinski 2021-08-15 09:47:50 UTC
There are a 3 places where the 
        call    lock_acquire
        call    debug_lockdep_rcu_enabled
        movq    32(%rsp), %rax
        popq    %rdx

Pattern exists and in GCC 7-8, only one of the 3 has the expanded pop for some reason.
Comment 4 Andrew Pinski 2021-08-15 10:07:17 UTC
So what is happening is reload produces:
(insn 136 135 187 11 (set (reg:DI 0 ax [orig:102 _38 ] [102])
        (mem/v/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 32 [0x20])) [2 MEM[(volatile __u64 *)&wait + 24B]+0 S8 A64])) "./include/linux/compiler.h":276 85 {*movdi_internal}
     (nil))
(insn 187 136 138 11 (set (reg:DI 2 cx [124])
        (plus:DI (reg/f:DI 7 sp)
            (const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi}
     (nil))
(insn 138 187 139 11 (parallel [
            (set (reg/f:DI 1 dx [122])
                (plus:DI (reg:DI 2 cx [124])
                    (const_int 24 [0x18])))
            (clobber (reg:CC 17 flags))
        ]) "fs/fs_pin.c":64 222 {*adddi_1}
     (nil))
(insn 139 138 140 11 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) "fs/fs_pin.c":64 222 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))

Notice how cx is being used.

And then post_reload produces:
(insn 187 136 138 11 (set (reg:DI 2 cx [124])
        (plus:DI (reg/f:DI 7 sp)
            (const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi}
     (nil))
(insn 138 187 139 11 (parallel [
            (set (reg/f:DI 1 dx [122])
                (plus:DI (reg/f:DI 7 sp)
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "fs/fs_pin.c":64 222 {*adddi_1}
     (nil))
(insn 139 138 140 11 (set (reg/f:DI 7 sp)
        (reg:DI 2 cx [124])) "fs/fs_pin.c":64 85 {*movdi_internal}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))

But I don't understand why it did not prop (plus (reg/f:DI 7 sp)            (const_int 8 [0x8])) into insn 139 and remove insn 187.

I think this is an issue for LRA/IRA really in the first place. We did not need to push the variable to the stack in the first place as we are going to Rematerialize the value after the pop anyways.


So Vlad might want to debug this to make sure this is not a latent bug.