[Bug rtl-optimization/79148] New: stack addresses are spilled to stack slots on x86-64 at -Os instead of rematerializing the addresses
froydnj at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jan 19 15:24:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79148
Bug ID: 79148
Summary: stack addresses are spilled to stack slots on x86-64
at -Os instead of rematerializing the addresses
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: froydnj at gcc dot gnu.org
Target Milestone: ---
Noticed this while browsing around Firefox source code compiled with GCC 5.4; a
colleague confirms that this happens with 6.3 as well. Compiling:
https://people.mozilla.org/~nfroyd/Unified_cpp_widget0.ii.gz
(Tried to get it under the attachment limit with xz, didn't happen)
with options:
-mtune=generic -march=x86-64 -g -Os -std=gnu++11 -fPIC -fno-strict-aliasing
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno
-freorder-blocks -fno-omit-frame-pointer -fstack-protector-strong
gives, for the function
_ZN7mozilla6widget11GfxInfoBase20GetFeatureStatusImplEiPiR18nsAString_internalRK8nsTArrayINS0_13GfxDriverInfoEER19nsACString_internalPNS0_15OperatingSystemE
a bit of code that looks like:
.LVL3402:
leaq -784(%rbp), %rax [1a]
.LVL3403:
movq %rax, %rdi
.LVL3404:
movq %rax, -816(%rbp) [1b]
call _ZN12nsAutoStringC1Ev
.LVL3405:
.loc 14 887 0
leaq -624(%rbp), %rax [2a]
movq %rax, %rdi
movq %rax, -824(%rbp) [2b]
call _ZN12nsAutoStringC1Ev
.LVL3406:
.loc 14 888 0
leaq -464(%rbp), %rax [3a]
movq %rax, %rdi
movq %rax, -800(%rbp) [3b]
call _ZN12nsAutoStringC1Ev
.LVL3407:
.loc 14 889 0
movq (%r12), %rax
movq -816(%rbp), %rsi [1c]
movq %r12, %rdi
call *104(%rax)
.LVL3408:
.loc 14 890 0
testl %eax, %eax
js .L2479
movq (%r12), %rax
movq -824(%rbp), %rsi [2c]
movq %r12, %rdi
call *120(%rax)
.LVL3409:
.loc 14 889 0
testl %eax, %eax
js .L2479
.loc 14 891 0
movq (%r12), %rax
movq -800(%rbp), %rsi [3c]
movq %r12, %rdi
call *168(%rax)
The problem here, for each of the trio of instructions marked [1], [2], and
[3], is that the instructions [1b], [2b], and [3b] that store the stack
addresses are really unnecessary; replacing [1c], [2c], and [3c] with the `lea`
instructions from [1a], [2a], and [3a] is the same size and doesn't require the
stack slot storage, so we could eliminate those instructions ([1b], [2b], and
[3b]) and (possibly) make the stack frame smaller as well.
I think rematerializing the stack addresses on x86/x86-64 ought always to be a
win in terms of size (I don't know whether you'd want to make the same choices
when compiling for speed); I think it'd be a similar win for RISC-y chips, at
least so long as the stack frame sizes are reasonably small.
More information about the Gcc-bugs
mailing list