[Bug rtl-optimization/79148] New: stack addresses are spilled to stack slots on x86-64 at -Os instead of rematerializing the addresses

froydnj at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Jan 19 15:24:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79148

            Bug ID: 79148
           Summary: stack addresses are spilled to stack slots on x86-64
                    at -Os instead of rematerializing the addresses
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: froydnj at gcc dot gnu.org
  Target Milestone: ---

Noticed this while browsing around Firefox source code compiled with GCC 5.4; a
colleague confirms that this happens with 6.3 as well.  Compiling:

https://people.mozilla.org/~nfroyd/Unified_cpp_widget0.ii.gz

(Tried to get it under the attachment limit with xz, didn't happen)

with options:

-mtune=generic -march=x86-64 -g -Os -std=gnu++11 -fPIC -fno-strict-aliasing
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno
-freorder-blocks -fno-omit-frame-pointer -fstack-protector-strong

gives, for the function
_ZN7mozilla6widget11GfxInfoBase20GetFeatureStatusImplEiPiR18nsAString_internalRK8nsTArrayINS0_13GfxDriverInfoEER19nsACString_internalPNS0_15OperatingSystemE
a bit of code that looks like:

.LVL3402:
        leaq    -784(%rbp), %rax      [1a]
.LVL3403:
        movq    %rax, %rdi
.LVL3404:
        movq    %rax, -816(%rbp)      [1b]
        call    _ZN12nsAutoStringC1Ev
.LVL3405:
        .loc 14 887 0
        leaq    -624(%rbp), %rax      [2a]
        movq    %rax, %rdi
        movq    %rax, -824(%rbp)      [2b]
        call    _ZN12nsAutoStringC1Ev
.LVL3406:
        .loc 14 888 0
        leaq    -464(%rbp), %rax      [3a]
        movq    %rax, %rdi
        movq    %rax, -800(%rbp)      [3b]
        call    _ZN12nsAutoStringC1Ev
.LVL3407:
        .loc 14 889 0
        movq    (%r12), %rax
        movq    -816(%rbp), %rsi      [1c]
        movq    %r12, %rdi
        call    *104(%rax)
.LVL3408:
        .loc 14 890 0
        testl   %eax, %eax
        js      .L2479
        movq    (%r12), %rax
        movq    -824(%rbp), %rsi      [2c]
        movq    %r12, %rdi
        call    *120(%rax)
.LVL3409:
        .loc 14 889 0
        testl   %eax, %eax
        js      .L2479
        .loc 14 891 0
        movq    (%r12), %rax
        movq    -800(%rbp), %rsi      [3c]
        movq    %r12, %rdi
        call    *168(%rax)

The problem here, for each of the trio of instructions marked [1], [2], and
[3], is that the instructions [1b], [2b], and [3b] that store the stack
addresses are really unnecessary; replacing [1c], [2c], and [3c] with the `lea`
instructions from [1a], [2a], and [3a] is the same size and doesn't require the
stack slot storage, so we could eliminate those instructions ([1b], [2b], and
[3b]) and (possibly) make the stack frame smaller as well.

I think rematerializing the stack addresses on x86/x86-64 ought always to be a
win in terms of size (I don't know whether you'd want to make the same choices
when compiling for speed); I think it'd be a similar win for RISC-y chips, at
least so long as the stack frame sizes are reasonably small.


More information about the Gcc-bugs mailing list