[Bug target/105930] [12/13 Regression] Excessive stack spill generation on 32-bit x86
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Jun 14 09:42:42 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, I've tried:
--- gcc/config/i386/i386.md.jj 2022-06-13 10:53:26.739290704 +0200
+++ gcc/config/i386/i386.md 2022-06-14 11:09:24.467024047 +0200
@@ -13734,14 +13734,13 @@
;; shift instructions and a scratch register.
(define_insn_and_split "ix86_rotl<dwi>3_doubleword"
- [(set (match_operand:<DWI> 0 "register_operand" "=r")
- (rotate:<DWI> (match_operand:<DWI> 1 "register_operand" "0")
- (match_operand:QI 2 "<shift_immediate_operand>" "<S>")))
- (clobber (reg:CC FLAGS_REG))
- (clobber (match_scratch:DWIH 3 "=&r"))]
- ""
+ [(set (match_operand:<DWI> 0 "register_operand")
+ (rotate:<DWI> (match_operand:<DWI> 1 "register_operand")
+ (match_operand:QI 2 "<shift_immediate_operand>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_pre_reload_split ()"
"#"
- "reload_completed"
+ "&& 1"
[(set (match_dup 3) (match_dup 4))
(parallel
[(set (match_dup 4)
@@ -13764,6 +13763,7 @@
(match_dup 6)))) 0)))
(clobber (reg:CC FLAGS_REG))])]
{
+ operands[3] = gen_reg_rtx (<MODE>mode);
operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
@@ -13771,14 +13771,13 @@
})
(define_insn_and_split "ix86_rotr<dwi>3_doubleword"
- [(set (match_operand:<DWI> 0 "register_operand" "=r")
- (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand" "0")
- (match_operand:QI 2 "<shift_immediate_operand>" "<S>")))
- (clobber (reg:CC FLAGS_REG))
- (clobber (match_scratch:DWIH 3 "=&r"))]
- ""
+ [(set (match_operand:<DWI> 0 "register_operand")
+ (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand")
+ (match_operand:QI 2 "<shift_immediate_operand>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_pre_reload_split ()"
"#"
- "reload_completed"
+ "&& 1"
[(set (match_dup 3) (match_dup 4))
(parallel
[(set (match_dup 4)
@@ -13801,6 +13800,7 @@
(match_dup 6)))) 0)))
(clobber (reg:CC FLAGS_REG))])]
{
+ operands[3] = gen_reg_rtx (<MODE>mode);
operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));
On the #c0 test with -O2 -m32 -mno-mmx -mno-sse it makes some difference, but
not as much as one would hope for:
Numbers from gcc 11.3.1 20220614, 11.3.1 20220614 with the patch, 13.0.0
20220610, 13.0.0 20220614 with the patch:
sub on %esp 428 2556 2620 2556
fn size in B 21657 23186 28413 23534
.s lines 6199 3942 7260 4198
So, trunk patched with the above patch results in significantly fewer
instructions, but larger (more of them use 32-bit immediates, mostly in form of
whatever(%esp) memory source operand).
And the stack usage is high.
I think the patch is still a good idea, it gives the RA more options, but we
should investigate why it consumes so much more stack and results in larger
code.
More information about the Gcc-bugs
mailing list