This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/77308] surprisingly large stack usage for sha512 on arm
- From: "bernd.edlinger at hotmail dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 21 Aug 2016 21:32:47 +0000
- Subject: [Bug target/77308] surprisingly large stack usage for sha512 on arm
- Authentication-results: sourceware.org; auth=none
- Auto-submitted: auto-generated
- References: <bug-77308-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #4 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
hmm, when I compare aarch64 vs. arm sha512.c.260r.reload
with -O3 -fno-schedule-insns
I see a big difference:
aarch64 has only few spill regs
subreg regs:
Slot 0 regnos (width = 8): 856
Slot 1 regnos (width = 8): 857
Slot 2 regnos (width = 8): 858
Slot 3 regnos (width = 8): 859
Slot 4 regnos (width = 8): 860
Slot 5 regnos (width = 8): 861
Slot 6 regnos (width = 8): 862
Slot 7 regnos (width = 8): 2117
Slot 8 regnos (width = 8): 1164
Slot 9 regnos (width = 8): 1052
but arm has 415 (8 bytes each)
and the line "subreg regs:" before the Spill Slots is contains ~1500 regs.
and while aarch64 does not have a single subreg in any pass,
the arm has lots of subregs before lra eliminates all of them.
like this, in sha512.c.217r.expand:
(insn 85 84 86 5 (set (subreg:SI (reg:DI 1670) 4)
(ashift:SI (subreg:SI (reg:DI 1669) 0)
(const_int 24 [0x18]))) sha512.c:98 -1
(nil))
(insn 86 85 87 5 (set (subreg:SI (reg:DI 1670) 0)
(const_int 0 [0])) sha512.c:98 -1
(nil))
This funny instruction is generated in arm_emit_coreregs_64bit_shift:
/* Shifts by a constant greater than 31. */
rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
if (code == ASHIFTRT)
emit_insn (gen_ashrsi3 (out_up, in_up,
GEN_INT (31)));
else
emit_insn (SET (out_up, const0_rtx));
From my past experience, I assume that using a subreg to write
an half of the out register makes more problems than it solves.
So I tried this:
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c (revision 239624)
+++ gcc/config/arm/arm.c (working copy)
@@ -29170,12 +29170,11 @@
/* Shifts by a constant greater than 31. */
rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
+ emit_insn (SET (out, const0_rtx));
emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
if (code == ASHIFTRT)
emit_insn (gen_ashrsi3 (out_up, in_up,
GEN_INT (31)));
- else
- emit_insn (SET (out_up, const0_rtx));
}
}
else
and it reduced the stack from 3472->2960