This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/77308] surprisingly large stack usage for sha512 on arm


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #4 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
hmm, when I compare aarch64 vs. arm sha512.c.260r.reload
with -O3 -fno-schedule-insns

I see a big difference:

aarch64 has only few spill regs

subreg regs:
  Slot 0 regnos (width = 8):     856
  Slot 1 regnos (width = 8):     857
  Slot 2 regnos (width = 8):     858
  Slot 3 regnos (width = 8):     859
  Slot 4 regnos (width = 8):     860
  Slot 5 regnos (width = 8):     861
  Slot 6 regnos (width = 8):     862
  Slot 7 regnos (width = 8):     2117
  Slot 8 regnos (width = 8):     1164
  Slot 9 regnos (width = 8):     1052


but arm has 415 (8 bytes each)
and the line "subreg regs:" before the Spill Slots is contains ~1500 regs.

and while aarch64 does not have a single subreg in any pass,
the arm has lots of subregs before lra eliminates all of them.

like this, in sha512.c.217r.expand:

(insn 85 84 86 5 (set (subreg:SI (reg:DI 1670) 4)
        (ashift:SI (subreg:SI (reg:DI 1669) 0)
            (const_int 24 [0x18]))) sha512.c:98 -1
     (nil))
(insn 86 85 87 5 (set (subreg:SI (reg:DI 1670) 0)
        (const_int 0 [0])) sha512.c:98 -1
     (nil))

This funny instruction is generated in arm_emit_coreregs_64bit_shift:

          /* Shifts by a constant greater than 31.  */
          rtx adj_amount = GEN_INT (INTVAL (amount) - 32);

          emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
          if (code == ASHIFTRT)
            emit_insn (gen_ashrsi3 (out_up, in_up,
                                    GEN_INT (31)));
          else
            emit_insn (SET (out_up, const0_rtx));

From my past experience, I assume that using a subreg to write
an half of the out register makes more problems than it solves.

So I tried this:

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c        (revision 239624)
+++ gcc/config/arm/arm.c        (working copy)
@@ -29170,12 +29170,11 @@
          /* Shifts by a constant greater than 31.  */
          rtx adj_amount = GEN_INT (INTVAL (amount) - 32);

+         emit_insn (SET (out, const0_rtx));
          emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
          if (code == ASHIFTRT)
            emit_insn (gen_ashrsi3 (out_up, in_up,
                                    GEN_INT (31)));
-         else
-           emit_insn (SET (out_up, const0_rtx));
        }
     }
   else

and it reduced the stack from 3472->2960

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]