This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/54236] [SH] Improve addc and subc insn utilization

From: "olegendo at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Sat, 02 Nov 2013 10:20:15 +0000
Subject: [Bug target/54236] [SH] Improve addc and subc insn utilization
Auto-submitted: auto-generated
References: <bug-54236-4 at http dot gcc dot gnu dot org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 31141
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31141&action=edit
widening unsigned addition

The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned
addition, and to do some sort of unsigned overflow checks.  For example:

unsigned long long
test_1 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, 1x movt, no tst
  return (unsigned long long)a + (unsigned long long)b;
}

minimal code:
        mov     r5,r1
        clrt
        addc    r4,r1
        rts
        movt    r0


unsigned int
test_2 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, no movt, no tst
  unsigned long long x = (unsigned long long)a + (unsigned long long)b;
  if (x >> 32)
    return 0xFFFFFFFF;
  else
    return (unsigned int)x;
}

minimal code (branchy, generic):
        clrt
        addc    r4,r5
        bt/s    .L6
        mov     #-1,r0
        mov     r5,r0
.L6:
        rts    
        nop

minimal code (branchless, special case for constant 0xFFFFFFFF):
        clrt
        addc    r4,r5
        subc    r1,r1     // T = 0: r1 = 0x00000000
                          // T = 1: r1 = 0xFFFFFFFF
        or      r1,r5
        rts
        mov     r5,r0


The attached patch makes the compiler output the minimal code sequences, except
for the branchless version of test_2.  It would require folding basic blocks
which combine won't do.
The patch enables the "zero_extendsidi2" insn on non-SHMedia.  Without that
insn, the following is expanded:

(insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4)
        (reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 4 r4 [ a ])
        (nil)))
(insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))
(insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4)
        (reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 5 r5 [ b ])
        (nil)))
(insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))

... and combine doesn't seem to understand this and thus will never try out
things such as

(plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand"))
         (zero_extend:DI (match_operand:SI 2 "arith_reg_operand"))))

The patch has a downside, though.  It makes 64 bit arithmetic slightly worse in
some cases (according to CSiBE).  As far as I can see it, there are mainly two
issues:
- loads of constant #0 are not hoisted/CSE'd
- some 64 bit highword optimizations are not applied anymore, in particular
comparisons of zero extended unsigned SImode and unsigned DImode.

The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern
'reserve' the constant #0 load, as it is done for some other insns:

-(define_insn "zero_extendsidi2"
+(define_expand "zero_extendsidi2"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+    (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))]
+  "TARGET_SHMEDIA || TARGET_SH1"
+{
+  if (TARGET_SH1)
+    {
+      if (can_create_pseudo_p ())
+    {
+      rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode));
+      emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1],
+                           zero_val));
+      DONE;
+    }
+      else
+    FAIL;
+    }
+})
+
+(define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "arith_reg_dest" "=r")
     (zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))]
   "TARGET_SHMEDIA"
@@ -5905,6 +5943,20 @@
   [(set_attr "type" "arith_media")
    (set_attr "highpart" "extend")])

+(define_insn_and_split "zero_extendsidi2_compact"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+    (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))
+   (use (match_operand:SI 2 "arith_reg_operand"))]
+  "TARGET_SH1"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]);
+  emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]);
+  DONE;
+})
+

However, maybe it'd be better to have a more generic solution to the constant
load/sharing problem after combine (see PR 51708 comment #3).

As for the comparisons, additional patterns would be required, which could be
part of PR 52898.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]