[Bug target/54236] [SH] Improve addc and subc insn utilization
olegendo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sat Nov 2 10:20:00 GMT 2013
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236
--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 31141
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31141&action=edit
widening unsigned addition
The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned
addition, and to do some sort of unsigned overflow checks. For example:
unsigned long long
test_1 (unsigned int a, unsigned int b)
{
// 1x addc, 1x clrt, 1x movt, no tst
return (unsigned long long)a + (unsigned long long)b;
}
minimal code:
mov r5,r1
clrt
addc r4,r1
rts
movt r0
unsigned int
test_2 (unsigned int a, unsigned int b)
{
// 1x addc, 1x clrt, no movt, no tst
unsigned long long x = (unsigned long long)a + (unsigned long long)b;
if (x >> 32)
return 0xFFFFFFFF;
else
return (unsigned int)x;
}
minimal code (branchy, generic):
clrt
addc r4,r5
bt/s .L6
mov #-1,r0
mov r5,r0
.L6:
rts
nop
minimal code (branchless, special case for constant 0xFFFFFFFF):
clrt
addc r4,r5
subc r1,r1 // T = 0: r1 = 0x00000000
// T = 1: r1 = 0xFFFFFFFF
or r1,r5
rts
mov r5,r0
The attached patch makes the compiler output the minimal code sequences, except
for the branchless version of test_2. It would require folding basic blocks
which combine won't do.
The patch enables the "zero_extendsidi2" insn on non-SHMedia. Without that
insn, the following is expanded:
(insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ])
(const_int 0 [0])) sh_tmp.cpp:22 -1
(nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4)
(reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie}
(expr_list:REG_DEAD (reg:SI 4 r4 [ a ])
(nil)))
(insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0)
(const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
(nil))
(insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ])
(const_int 0 [0])) sh_tmp.cpp:22 -1
(nil))
(insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4)
(reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie}
(expr_list:REG_DEAD (reg:SI 5 r5 [ b ])
(nil)))
(insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0)
(const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
(nil))
... and combine doesn't seem to understand this and thus will never try out
things such as
(plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand"))
(zero_extend:DI (match_operand:SI 2 "arith_reg_operand"))))
The patch has a downside, though. It makes 64 bit arithmetic slightly worse in
some cases (according to CSiBE). As far as I can see it, there are mainly two
issues:
- loads of constant #0 are not hoisted/CSE'd
- some 64 bit highword optimizations are not applied anymore, in particular
comparisons of zero extended unsigned SImode and unsigned DImode.
The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern
'reserve' the constant #0 load, as it is done for some other insns:
-(define_insn "zero_extendsidi2"
+(define_expand "zero_extendsidi2"
+ [(set (match_operand:DI 0 "arith_reg_dest")
+ (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))]
+ "TARGET_SHMEDIA || TARGET_SH1"
+{
+ if (TARGET_SH1)
+ {
+ if (can_create_pseudo_p ())
+ {
+ rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode));
+ emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1],
+ zero_val));
+ DONE;
+ }
+ else
+ FAIL;
+ }
+})
+
+(define_insn "*zero_extendsidi2"
[(set (match_operand:DI 0 "arith_reg_dest" "=r")
(zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))]
"TARGET_SHMEDIA"
@@ -5905,6 +5943,20 @@
[(set_attr "type" "arith_media")
(set_attr "highpart" "extend")])
+(define_insn_and_split "zero_extendsidi2_compact"
+ [(set (match_operand:DI 0 "arith_reg_dest")
+ (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))
+ (use (match_operand:SI 2 "arith_reg_operand"))]
+ "TARGET_SH1"
+ "#"
+ "&& can_create_pseudo_p ()"
+ [(const_int 0)]
+{
+ emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]);
+ emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]);
+ DONE;
+})
+
However, maybe it'd be better to have a more generic solution to the constant
load/sharing problem after combine (see PR 51708 comment #3).
As for the comparisons, additional patterns would be required, which could be
part of PR 52898.
More information about the Gcc-bugs
mailing list