This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
On 11/29/16 16:06, Wilco Dijkstra wrote:
> Bernd Edlinger wrote:
>
> - "TARGET_32BIT && reload_completed
> + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)
> && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
>
> This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're
> already excluding NEON.
>
Aehm, no. This would split the addi_neon insn before it is clear
if the reload pass will assign a VFP register.
With this change the stack usage with -mfpu=neon increases
from 2300 to around 2600 bytes.
> This patch expands ADD and SUB earlier, so shouldn't we do the same obvious
> change for the similar instructions CMP and NEG?
>
Good question. I think the cmp and neg pattern are more complicated
and do typically have a more complicated data flow than the other
patterns.
I tried to create a test case which expands cmpdi and negdi patterns
as follows:
--- pr77308-1.c 2016-11-25 17:53:20.379141465 +0100
+++ pr77308-2.c 2016-11-29 20:46:51.266948631 +0100
@@ -68,10 +68,10 @@
#define B(x,j) (((SHA_LONG64)(*(((const unsigned char
*)(&x))+j)))<<((7-j)*8))
#define PULL64(x)
(B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7))
#define ROTR(x,s) (((x)>>s) | (x)<<(64-s))
-#define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39))
-#define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41))
-#define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7))
-#define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6))
+#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39) ==
(x) ? -(x) : (x))
+#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ROTR((x),41) <
(x) ? -(x) : (x))
+#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7) <= (x)
? ~(x) : (x))
+#define sigma1(x) ((long long)(ROTR((x),19) ^ ROTR((x),61) ^
((x)>>6)) < (long long)(x) ? -(x) : (x))
#define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z)))
#define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))
This expands *arm_negdi2, *arm_cmpdi_unsigned, *arm_cmpdi_insn.
The stack usage is around 1900 bytes with previous patch,
and 2300 bytes without.
I tried to split *arm_negdi2 and *arm_cmpdi_unsined early, and it
gives indeed smaller stack sizes in the test case above (~400 bytes).
But when I make *arm_cmpdi_insn split early, it ICEs:
--- arm.md.orig 2016-11-27 09:22:41.794790123 +0100
+++ arm.md 2016-11-29 21:51:51.438163078 +0100
@@ -7432,7 +7432,7 @@
(clobber (match_scratch:SI 2 "=r"))]
"TARGET_32BIT"
"#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
- "&& reload_completed"
+ "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
[(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 0) (match_dup 1)))
(parallel [(set (reg:CC CC_REGNUM)
ontop of the latest patch, I got:
gcc -S -Os pr77308-2.c -fdump-rtl-all-verbose
pr77308-2.c: In function 'sha512_block_data_order':
pr77308-2.c:169:1: error: unrecognizable insn:
}
^
(insn 4870 4869 1636 87 (set (scratch:SI)
(minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4)
(subreg:SI (reg:DI 473 [ X$14 ]) 4))
(ltu:SI (reg:CC_C 100 cc)
(const_int 0 [0])))) "pr77308-2.c":140 -1
(nil))
pr77308-2.c:169:1: internal compiler error: in extract_insn, at recog.c:2311
0xaf4cd8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc-trunk/gcc/rtl-error.c:108
0xaf4d09 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
../../gcc-trunk/gcc/rtl-error.c:116
0xac74ef extract_insn(rtx_insn*)
../../gcc-trunk/gcc/recog.c:2311
0x122427a decompose_multiword_subregs
../../gcc-trunk/gcc/lower-subreg.c:1467
0x122550d execute
../../gcc-trunk/gcc/lower-subreg.c:1734
So it is certainly possible, but not really simple to improve the
stack size even further. But I would prefer to do that in a
separate patch.
BTW: there are also negd2_compare, *negdi_extendsidi,
*negdi_zero_extendsidi, *thumb2_negdi2.
I think it would be a precondition to have test cases that exercise
each of these patterns before we try to split these instructions.
Bernd.