54236 – [SH] Improve addc and subc insn utilization

Bug 54236 - [SH] Improve addc and subc insn utilization

Summary: [SH] Improve addc and subc insn utilization

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.8.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Oleg Endo

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2012-08-12 22:25 UTC by Oleg Endo
Modified:	2023-07-22 03:07 UTC (History)
CC List:	0 users

See Also:
Host:
Target:	sh--*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-08-12 00:00:00

Attachments
widening unsigned addition (889 bytes, patch) 2013-11-02 10:20 UTC, Oleg Endo	Details \| Diff
stitching addc insns (1.56 KB, patch) 2013-11-03 17:48 UTC, Oleg Endo	Details \| Diff
addsicc pattern (2.89 KB, patch) 2015-07-19 06:47 UTC, Oleg Endo	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Oleg Endo 2012-08-12 22:25:32 UTC

There are currently a couple of cases, where it would be better if addc or subc insns were used.  For example:

int test00 (int a, int b)
{
  return a + b + 1;
}


gets compiled to:

        mov     r4,r0   ! MT
        add     r5,r0   ! EX
        rts
        add     #1,r0   ! EX

could be better as:

        mov     r4,r0   ! MT
        sett    r5,r0   ! MT (SH4)
        rts
        addc    #1,r0   ! EX

As a proof of concept, I've applied the following to handle the above case:

Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 190326)
+++ gcc/config/sh/sh.md	(working copy)
@@ -1465,7 +1465,7 @@
 
 (define_insn "addc"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-	(plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "0")
+	(plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0")
 			  (match_operand:SI 2 "arith_reg_operand" "r"))
 		 (reg:SI T_REG)))
    (set (reg:SI T_REG)
@@ -1516,6 +1516,24 @@
   "add	%2,%0"
   [(set_attr "type" "arith")])
 
+(define_insn_and_split "*addsi3_compact"
+  [(set (match_operand:SI 0 "arith_reg_dest" "")
+	(plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "")
+		 	  (match_operand:SI 2 "arith_reg_operand" ""))
+		 (const_int 1)))
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1"
+  "#"
+  "&& 1"
+  [(set (reg:SI T_REG) (const_int 1))
+   (parallel [(set (match_dup 0)
+		   (plus:SI (plus:SI (match_dup 1)
+				     (match_dup 2))
+			    (reg:SI T_REG)))
+	      (set (reg:SI T_REG)
+		   (ltu:SI (plus:SI (match_dup 1) (match_dup 2))
+			   (match_dup 1)))])])
+
 ;; -------------------------------------------------------------------------
 ;; Subtraction instructions
 ;; -------------------------------------------------------------------------

.. and observed some code from the CSiBE set for -O2 -m4-single -ml -mpretend-cmove.  It doesn't affect code size that much (some incs/decs here and there), but more importantly it does this (libmpeg2/motion_comp.c):

_MC_avg_o_16_c:            --> 
        mov.b   @r5,r1			mov.b	@r5,r2
.L16:				.L16:
        mov.b   @r4,r2                  sett
        extu.b  r1,r1                   mov.b	@r4,r1
        extu.b  r2,r2                   extu.b	r2,r2
        add     r2,r1                   extu.b	r1,r1
        add     #1,r1                   addc	r2,r1
        shar    r1                      shar	r1
        mov.b   r1,@r4                  mov.b	r1,@r4
        mov.b   @(1,r5),r0              sett
        extu.b  r0,r1                   mov.b	@(1,r5),r0
        mov.b   @(1,r4),r0              extu.b	r0,r1
        extu.b  r0,r0                   mov.b	@(1,r4),r0
        add     r0,r1                   extu.b	r0,r0
        add     #1,r1                   addc	r1,r0
        shar    r1                      shar	r0
        mov     r1,r0                   mov.b	r0,@(1,r4)
        mov.b   r0,@(1,r4)

In such cases, the sett,addc sequence can be scheduled much better and in most cases the sett insn can be executed in parallel with some other insn.
Unfortunately, on SH4A the sett insn has been moved from MT group to EX group, but still it seems beneficial.  I've also seen a couple of places, where sett-subc sequences would be better.

Comment 1 Oleg Endo 2012-08-16 23:19:51 UTC

Author: olegendo
Date: Thu Aug 16 23:19:45 2012
New Revision: 190459

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190459
Log:
	PR target/54236
	* config/sh/sh.md (addc): Add commutative modifier.
	(*addc, *minus_plus_one, *subc, *negc): New insns and splits.

	PR target/54236
	* gcc.target/sh/pr54236-1.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog

Comment 2 Oleg Endo 2012-09-19 17:45:53 UTC

Author: olegendo
Date: Wed Sep 19 17:45:37 2012
New Revision: 191489

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191489
Log:
	PR target/54236
	* config/sh/sh.md (*addc): Add pattern to handle one bit left shifts.

	PR target/54236
	* gcc.target/sh/pr54236-1.c (test_08): Add one bit left shift case.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c

Comment 3 Oleg Endo 2013-09-22 22:22:23 UTC

Some more addc candidates I ran into:

int test00 (int a, int b, int c, int d)
{
  return (d & 1) + a + b;
}

-O2 -m4:
        mov     r7,r0
        and     #1,r0
        add     r4,r0
        rts
        add     r5,r0

better:
        shlr    r7      // T = r7 & 1
        addc    r4,r5
        rts
        mov     r5,r0



int test01 (int a, int b, int c, int d)
{
  return ((d >> 31) & 1) + a + b;
}

-O2 -m4:
        shll    r7
        movt    r0
        add     r4,r0
        rts
        add     r5,r0

better:
        shll    r7      // T = (r7 >> 31) & 1
        addc    r4,r5
        rts
        mov     r5,r0

Comment 4 Oleg Endo 2013-10-29 20:45:59 UTC

Author: olegendo
Date: Tue Oct 29 20:45:56 2013
New Revision: 204180

URL: http://gcc.gnu.org/viewcvs?rev=204180&root=gcc&view=rev
Log:
	PR target/54236
	* config/sh/sh.md (*addc): Rename existing variations to ...
	(*addc_r_r_1, *addc_2r_1, *addc_r_1): ... these.
	(*addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_2r_lsb, *addc_r_msb,
	*addc_r_r_msb, *addc_2r_msb): New insn_and_split patterns.
	* config/sh/sh.c (addsubcosts): Handle some addc special cases.

	PR target/54236
	* gcc.target/sh/pr54236-2: New.
	* gcc.target/sh/pr54089-6: Add another rotl special case.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54089-6.c

Comment 5 Oleg Endo 2013-11-02 10:20:15 UTC

Created attachment 31141 [details]
widening unsigned addition

The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned addition, and to do some sort of unsigned overflow checks.  For example:

unsigned long long
test_1 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, 1x movt, no tst
  return (unsigned long long)a + (unsigned long long)b;
}

minimal code:
        mov     r5,r1
        clrt
        addc    r4,r1
        rts
        movt    r0


unsigned int
test_2 (unsigned int a, unsigned int b)
{
  // 1x addc, 1x clrt, no movt, no tst
  unsigned long long x = (unsigned long long)a + (unsigned long long)b;
  if (x >> 32)
    return 0xFFFFFFFF;
  else
    return (unsigned int)x;
}

minimal code (branchy, generic):
        clrt
        addc    r4,r5
        bt/s    .L6
        mov     #-1,r0
        mov     r5,r0
.L6:
        rts	
        nop

minimal code (branchless, special case for constant 0xFFFFFFFF):
        clrt
        addc    r4,r5
        subc    r1,r1     // T = 0: r1 = 0x00000000
                          // T = 1: r1 = 0xFFFFFFFF
        or      r1,r5
        rts
        mov     r5,r0


The attached patch makes the compiler output the minimal code sequences, except for the branchless version of test_2.  It would require folding basic blocks which combine won't do.
The patch enables the "zero_extendsidi2" insn on non-SHMedia.  Without that insn, the following is expanded:

(insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4)
        (reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 4 r4 [ a ])
        (nil)))
(insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))
(insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ])
        (const_int 0 [0])) sh_tmp.cpp:22 -1
     (nil))
(insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4)
        (reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 5 r5 [ b ])
        (nil)))
(insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0)
        (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie}
     (nil))

... and combine doesn't seem to understand this and thus will never try out things such as

(plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand"))
         (zero_extend:DI (match_operand:SI 2 "arith_reg_operand"))))

The patch has a downside, though.  It makes 64 bit arithmetic slightly worse in some cases (according to CSiBE).  As far as I can see it, there are mainly two issues:
- loads of constant #0 are not hoisted/CSE'd
- some 64 bit highword optimizations are not applied anymore, in particular comparisons of zero extended unsigned SImode and unsigned DImode.

The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern 'reserve' the constant #0 load, as it is done for some other insns:

-(define_insn "zero_extendsidi2"
+(define_expand "zero_extendsidi2"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+	(zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))]
+  "TARGET_SHMEDIA || TARGET_SH1"
+{
+  if (TARGET_SH1)
+    {
+      if (can_create_pseudo_p ())
+	{
+	  rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode));
+	  emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1],
+						   zero_val));
+	  DONE;
+	}
+      else
+	FAIL;
+    }
+})
+
+(define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "arith_reg_dest" "=r")
 	(zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))]
   "TARGET_SHMEDIA"
@@ -5905,6 +5943,20 @@
   [(set_attr "type" "arith_media")
    (set_attr "highpart" "extend")])
 
+(define_insn_and_split "zero_extendsidi2_compact"
+  [(set (match_operand:DI 0 "arith_reg_dest")
+	(zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))
+   (use (match_operand:SI 2 "arith_reg_operand"))]
+  "TARGET_SH1"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]);
+  emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]);
+  DONE;
+})
+

However, maybe it'd be better to have a more generic solution to the constant load/sharing problem after combine (see PR 51708 comment #3).

As for the comparisons, additional patterns would be required, which could be part of PR 52898.

Comment 6 Oleg Endo 2013-11-03 17:48:03 UTC

Created attachment 31144 [details]
stitching addc insns

The attached patch is an example that shows how widening additions can be stitched together.  One application would be arithmetic on user defined integer types with an arbitrary number of bits.
For example (requires c++11):

template <unsigned int Bits> class integer
{
public:
  typedef unsigned int word_type;
  typedef unsigned long long ext_word_type;

  static constexpr unsigned int bit_count = Bits;
  static constexpr unsigned int word_bit_count = sizeof (word_type) * 8;
  static constexpr unsigned int word_count = (bit_count + word_bit_count - 1)
                                              / word_bit_count;

private:
  word_type word[word_count];

public:
  friend integer
  operator + (const integer& a, const integer& b)
  {
    integer result;

    word_type carry = 0;
    for (unsigned int i = 0; i < word_count; ++i)
    {
      auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry;
      result.word[i] = (word_type)sum;
      carry = (sum >> word_bit_count) == 0 ? 0 : 1;
    }

    return result;
  }
};

With this patch the following examples, compiled with -funroll-all-loops -O2 demonstrate the effect:

--------------------------

integer<64> test_02 (const integer<64>& a, const integer<64>& b)
{
  return a + b;
}

        mov.l   @r5,r1
        mov.l   @r4,r0
        clrt
        mov.l   @(4,r5),r2
        addc    r1,r0
        mov.l   @(4,r4),r1
        rts
        addc    r2,r1

this is the same as a 'native' 64 bit addition.

--------------------------

integer<80> test_03 (const integer<80>& a, const integer<80>& b)
{
  return a + b;
}

        mov.l   @r5,r3
        mov.l   @r4,r1
        clrt
        mov.l   @(4,r5),r0
        mov.l   @(4,r4),r6
        addc    r3,r1
        mov.l   @(8,r5),r5
        mov.l   @(8,r4),r4
        addc    r0,r6
        mov.l   r1,@r2
        mov     r2,r0
        addc    r5,r4
        mov.l   r6,@(4,r2)
        rts
        mov.l   r4,@(8,r2)

80 bits are rounded up to 96 in the template 'integer', thus 3 addc insns are required to do the 96 bit addition.

However, when compiling without loop unrolling, it doesn't work because there is no mechanism to feed back the carry variable inside a loop.  The test_03 function becomes:

        mov.l   r8,@-r15
        mov.l   r9,@-r15
        mov.l   r10,@-r15
        mov.l   r11,@-r15
        mov     #0,r0
        mov     #0,r10
.L4:
        mov.l   @(r0,r4),r3
        clrt
        mov.l   @(r0,r5),r1
        mov     r10,r7     // r7 = carry from previous iteration
        mov     #0,r6
        mov     r1,r9
        addc    r3,r9      // r9 = a[i] + b[i] (lsw of 64 bit result)
        movt    r10        // r10 = carry (msw of 64 bit result)
        clrt
        mov     r9,r11
        addc    r7,r11     // r11 = r9 + previous carry (lsw of 64 bit result)
        addc    r6,r10     // r10 = carry for next iteration
        mov.l   r11,@(r0,r2)
        add     #4,r0
        cmp/eq  #12,r0
        bf      .L4

        mov.l   @r15+,r11
        mov     r2,r0
        mov.l   @r15+,r10
        mov.l   @r15+,r9
        rts
        mov.l   @r15+,r8


it could be something like this:

        mov     #0,r0
        mov     #0,r6
.L4:
        mov.l   @(r0,r4),r3
        mov.l   @(r0,r5),r1
        cmp/pl  r6         // T = r6 > 0 (get carry into T bit)
        addc    r3,r1      // r1 = a[i] + b[i] + T (lsw of 64 bit result)
        movt    r6         // r6 = new carry (msw of 64 bit result)
        mov.l   r1,@(r0,r2)
        add     #4,r0
        cmp/eq  #12,r0
        bf      .L4

        rts
        mov     r2,r0

However, that would require some loop analysis in order to discover the T bit feedback opportunity.

Comment 7 Oleg Endo 2013-11-03 20:42:34 UTC

(In reply to Oleg Endo from comment #6)
> Created attachment 31144 [details]
> stitching addc insns
> 
> The attached patch is an example that shows how widening additions can be
> stitched together.  One application would be arithmetic on user defined
> integer types with an arbitrary number of bits.
> For example (requires c++11):
> 
> template <unsigned int Bits> class integer
> {
> public:
>   typedef unsigned int word_type;
>   typedef unsigned long long ext_word_type;
> 
>   static constexpr unsigned int bit_count = Bits;
>   static constexpr unsigned int word_bit_count = sizeof (word_type) * 8;
>   static constexpr unsigned int word_count = (bit_count + word_bit_count - 1)
>                                               / word_bit_count;
> 
> private:
>   word_type word[word_count];
> 
> public:
>   friend integer
>   operator + (const integer& a, const integer& b)
>   {
>     integer result;
> 
>     word_type carry = 0;
>     for (unsigned int i = 0; i < word_count; ++i)
>     {
>       auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry;
>       result.word[i] = (word_type)sum;
>       carry = (sum >> word_bit_count) == 0 ? 0 : 1;
>     }
> 
>     return result;
>   }
> };
> 

I forgot to mention that the patch works only if the type of the carry variable is 'unsigned int' or 'unsigned long long'.  If a signed or boolean type is used some weird things seem to happen and there are some sign extensions somewhere around the T bit.

Comment 8 Oleg Endo 2014-05-21 08:06:37 UTC

Author: olegendo
Date: Wed May 21 08:06:06 2014
New Revision: 210682

URL: http://gcc.gnu.org/viewcvs?rev=210682&root=gcc&view=rev
Log:
gcc/
        PR target/54236
        * config/sh/sh.md (*addc_r_1): Rename to addc_t_r.  Remove empty
        constraints.
        (*addc_r_t): Add new insn_and_split.

gcc/testsuite/
        PR target/54236
        * gcc.target/sh/pr54236-3.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog

Comment 9 Oleg Endo 2014-09-27 17:41:52 UTC

The following function compiled with -O2

unsigned int check (unsigned int x)
{
  return x == 0 ? 1 : x;
}

results in:
        tst     r4,r4
        bt/s    .L7
        mov     #1,r0
        mov     r4,r0
.L7:
        rts
        nop


Writing it as:
unsigned int check (unsigned int x)
{
  return x + (x == 0);
}

results in:
        tst     r4,r4
        mov     #0,r0
        rts
        addc    r4,r0

It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name pattern.  If the 2nd operand of the conditional addition is a constant 1 or -1 the addcc insn can be implemented via addc or subc without a branch.

If the comparison can't be done in one insn (e.g. x != 0) and requires T bit negation/inversion:

unsigned int check (unsigned int x)
{
  return x + (x != 0);
}

results in (non-SH2A):
        tst     r4,r4
        mov     #-1,r0
        negc    r0,r0
        rts
        add     r4,r0

slightly better (one arith insn less):
        tst     r4,r4
        mov     #-1,r0
        subc    r0,r4    // r4 = r4 - (-1) - T = r4 + 1 - T
        rts
        mov     r4,r0

and on SH2A (already minimal):
        tst     r4,r4
        movrt   r0
        rts
        add     r4,r0



subc case:

unsigned int check (unsigned int x)
{
  return x - (x != 0);
}

non-SH2A:

        tst     r4,r4
        mov     #-1,r1
        mov     r4,r0
        negc    r1,r1
        rts
        sub     r1,r0    // r0 = r4 - (1 - T) = r4 - 1 + T

SH2A:
        tst     r4,r4
        movrt   r1
        mov     r4,r0
        rts
        sub     r1,r0

better (non-SH2A and SH2A):
        tst    r4,r4
        mov    #-1,r0
        rts
        addc   r4,r0     // r0 = -1 + r4 + T = r4 - 1 + T


The equivalent conditional version should result in the same (currently it isn't):

unsigned int check (unsigned int x)
{
  return x != 0 ? (x - 1) : x;
}

Since ifcvt is also ran after combine, combine won't get the chance to combine patterns such as:

(set (reg:SI 165 [ D.1635 ])
    (plus:SI (xor:SI (reg:SI 147 t)
            (const_int 1 [0x1]))
        (reg/v:SI 164 [ x ])))

which would occur after emitting a comparison, negating the result (T bit) and then adding/subtracting the T bit via addc/subc.  If that happens, the combine must be done manually in the split pass as it's done with some of the other insns.

Comment 10 Oleg Endo 2015-01-24 13:05:26 UTC

Author: olegendo
Date: Sat Jan 24 13:04:53 2015
New Revision: 220081

URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev
Log:
gcc/
	PR target/49263
	PR target/53987
	PR target/64345
	PR target/59533
	PR target/52933
	PR target/54236
	PR target/51244
	* config/sh/sh-protos.h
	(sh_extending_set_of_reg::can_use_as_unextended_reg,
	sh_extending_set_of_reg::use_as_unextended_reg,
	sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn,
	sh_is_movrt_insn, sh_insn_operands_modified_between_p,
	sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr,
	sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions.
	(sh_treg_insns): New class.
	* config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook.
	(scope_counter): New class.
	(sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest,
	sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn,
	sh_extending_set_of_reg::can_use_as_unextended_reg,
	sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr,
	sh_in_recog_treg_set_expr, sh_try_split_insn_simple,
	sh_split_treg_set_expr): New functions.
	(addsubcosts): Handle treg_set_expr.
	(sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT.
	(sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND.
	(sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases.
	(sh_insn_operands_modified_between_p): Make non-static.
	* config/sh/predicates.md (zero_extend_movu_operand): Allow
	simple_mem_operand in addition to displacement_mem_operand.
	(zero_extend_operand): Don't allow zero_extend_movu_operand.
	(treg_set_expr, treg_set_expr_not_const01,
	arith_reg_or_treg_set_expr): New predicates.
	* config/sh/sh.md (tstsi_t): Use arith_reg_operand and
	arith_or_int_operand instead of logical_operand.  Convert to
	insn_and_split.  Try to optimize constant operand in splitter.
	(tsthi_t, tstqi_t): Fold into *tst<mode>_t.  Convert to insn_and_split.
	(*tstqi_t_zero): Delete.
	(*tst<mode>_t_subregs): Add !sh_in_recog_treg_set_expr split condition.
	(tstsi_t_and_not): Delete.
	(tst<mode>_t_zero_extract_eq): Rename to *tst<mode>_t_zero_extract.
	Convert to insn_and_split.
	(unnamed split, tstsi_t_zero_extract_xor,
	tstsi_t_zero_extract_subreg_xor_little,
	tstsi_t_zero_extract_subreg_xor_big): Delete.
	(*tstsi_t_shift_mask): New insn_and_split.
	(cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try
	to recombine with surrounding insns when splitting.
	(*negtstsi): Add !sh_in_recog_treg_set_expr condition.
	(cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ...
	(cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4,
	*cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns.
	(*cbranch_div0s: Delete.
	(*addc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
	Try to recombine with surrounding insns when splitting.  Add operand
	order variants.
	(*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01.
	(*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb,
	*addc_r_r_msb, *addc_2r_msb): Delete.
	(*addc_2r_lsb): Rename to *addc_2r_t.  Use treg_set_expr.  Add operand
	order variant.
	(*addc_negreg_t): New insn_and_split.
	(*subc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
	Try to recombine with surrounding insns when splitting.
	Add operand order variants.  
	(*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New
	insn_and_split patterns.
	(*rotcr): Use arith_reg_or_treg_set_expr.  Try to recombine with
	surrounding insns when splitting.
	(unnamed rotcr split): Use arith_reg_or_treg_set_expr.
	(*rotcl): Likewise.  Add zero_extract variant.
	(*ashrsi2_31): New insn_and_split.
	(*negc): Convert to insn_and_split.  Use treg_set_expr.
	(*zero_extend<mode>si2_disp_mem): Update comment.
	(movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split
	condition.
	(*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr.  Try to recombine
	with surrounding insns when splitting.
	(any_treg_expr_to_reg): New insn_and_split.
	(*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2,
	*neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5,
	*neg_zero_extract_6, *zero_extract_0, *zero_extract_1,
	*zero_extract_2): New single bit zero extract patterns.
	(bld_reg, *bld_regqi): Fold into bld<mode>_reg.
	(*get_thread_pointersi, store_gbr, *mov<mode>_gbr_load,
	*mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load,
	*movdi_gbr_load): Use arith_reg_dest instead of register_operand for
	set destination.
	(set_thread_pointersi, load_gbr): Use arith_reg_operand instead of
	register_operand for set source.

gcc/testsuite/
	PR target/49263
	PR target/53987
	PR target/64345
	PR target/59533
	PR target/52933
	PR target/54236
	PR target/51244
	* gcc.target/sh/pr64345-1.c: New.
	* gcc.target/sh/pr64345-2.c: New.
	* gcc.target/sh/pr59533-1.c: New.
	* gcc.target/sh/pr49263.c: Adjust matching of expected insns.
	* gcc.target/sh/pr52933-2.c: Likewise.
	* gcc.target/sh/pr54089-1.c: Likewise.
	* gcc.target/sh/pr54236-1.c: Likewise.
	* gcc.target/sh/pr51244-20-sh2a.c: Likewise.
	* gcc.target/sh/pr49263-1.c: Remove xfails.
	* gcc.target/sh/pr49263-2.c: Likewise.
	* gcc.target/sh/pr49263-3.c: Likewise.
	* gcc.target/sh/pr53987-1.c: Likewise.
	* gcc.target/sh/pr52933-1.c: Adjust matching of expected insns.
	(test_24, test_25, test_26, test_27, test_28, test_29, test_30): New.
	* gcc.target/sh/pr51244-12.c: Adjust matching of expected insns.
	(test05, test06, test07, test08, test09, test10, test11, test12): New.
	* gcc.target/sh/pr54236-3.c: Adjust matching of expected insns.
	(test_002, test_003, test_004, test_005, test_006, test_007, test_008,
	test_009): New.
	* gcc.target/sh/pr51244-4.c: Adjust matching of expected insns.
	(test_02): New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh-protos.h
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr53987-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c

Comment 11 Oleg Endo 2015-01-25 16:42:13 UTC

Author: olegendo
Date: Sun Jan 25 16:41:25 2015
New Revision: 220093

URL: https://gcc.gnu.org/viewcvs?rev=220093&root=gcc&view=rev
Log:
gcc/testsuite/
	PR target/54236
	* gcc.target/sh/pr54236-4.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-4.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 12 Oleg Endo 2015-05-17 21:02:01 UTC

Author: olegendo
Date: Sun May 17 21:01:29 2015
New Revision: 223274

URL: https://gcc.gnu.org/viewcvs?rev=223274&root=gcc&view=rev
Log:
gcc/
	PR target/54236
	* config/sh/sh.md (*addc_2r_t): Use ashift instead of mult.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md

Comment 13 Oleg Endo 2015-05-19 08:01:13 UTC

Author: olegendo
Date: Tue May 19 08:00:41 2015
New Revision: 223346

URL: https://gcc.gnu.org/viewcvs?rev=223346&root=gcc&view=rev
Log:
gcc/
	PR target/54236
	* config/sh/sh.md (*round_int_even): New insn_and_split and
	accompanying new unnamed split.

gcc/testsuite/
	PR target/54236
	* gcc.target/sh/pr54236-2.c: Adjust expected insn counts.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c

Comment 14 Oleg Endo 2015-05-21 12:37:07 UTC

Author: olegendo
Date: Thu May 21 12:36:35 2015
New Revision: 223479

URL: https://gcc.gnu.org/viewcvs?rev=223479&root=gcc&view=rev
Log:
gcc/
	PR target/54236
	* config/sh/sh.md (*round_int_even): Reject pattern if operands[0] and
	operands[1] are the same.

gcc/testsuite/
	PR target/54236
	* gcc.target/sh/pr54236-2.c: Fix typo in comment.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c

Comment 15 Oleg Endo 2015-07-19 05:32:04 UTC

The following shows missed subc cases when there are constants involved.  Addc cases can be constructed in the same way.

int fun (int x)
{
  return x - 1 - (x > 100);
}

-O2 -m4:
	mov	#100,r1
	mov	r4,r0
	cmp/gt	r1,r4
	movt	r1
	sett
	rts	
	subc	r1,r0

better:
	mov	#100,r1
	mov	r4,r0
	cmp/gt	r1,r4
	mov	#1,r1
	rts
	subc	r1,r0

Combine is looking for a pattern like:
(parallel [
        (set (reg:SI 168 [ D.1659 ])
            (plus:SI (not:SI (gt:SI (reg/v:SI 167 [ x ])
                        (reg:SI 172)))
                (reg/v:SI 167 [ x ])))
        (clobber (reg:SI 147 t))
    ])

constants -1 and -2 are represented as not (0) and not (1) in this case.


int fun (int x)
{
  return x - 10 - (x > 100);
}

-O2 -m4:
	mov	#100,r1
	mov	r4,r0
	cmp/gt	r1,r4
	add	#-10,r0
	mov	#0,r1
	rts	
	subc	r1,r0

better:
	mov	#100,r1
	mov	r4,r0
	cmp/gt	r1,r4
	mov	#10,r1
	rts	
	subc	r1,r0
 
Combine is looking for a pattern like

Failed to match this instruction:
(parallel [
        (set (reg:SI 168 [ D.1659 ])
            (plus:SI (minus:SI (reg/v:SI 167 [ x ])
                    (gt:SI (reg/v:SI 167 [ x ])
                        (reg:SI 172)))
                (const_int -10 [0xfffffffffffffff6])))
        (clobber (reg:SI 147 t))
    ])

there is already a similar pattern *subc_negreg_t

Comment 16 Oleg Endo 2015-07-19 06:47:47 UTC

Created attachment 36012 [details]
addsicc pattern

(In reply to Oleg Endo from comment #9)
> The following function compiled with -O2
> 
> unsigned int check (unsigned int x)
> {
>   return x == 0 ? 1 : x;
> }
> 
> results in:
>         tst     r4,r4
>         bt/s    .L7
>         mov     #1,r0
>         mov     r4,r0
> .L7:
>         rts
>         nop
> 
> 
> Writing it as:
> unsigned int check (unsigned int x)
> {
>   return x + (x == 0);
> }
> 
> results in:
>         tst     r4,r4
>         mov     #0,r0
>         rts
>         addc    r4,r0
> 
> It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name
> pattern.  If the 2nd operand of the conditional addition is a constant 1 or
> -1 the addcc insn can be implemented via addc or subc without a branch.

The attached patch adds support for the addsicc pattern and a few other improvements. However, the first case above doesn't see any improvement.  It seems that it's a missed ifcvt optimization.

Comment 17 Oleg Endo 2015-09-28 14:01:16 UTC

Author: olegendo
Date: Mon Sep 28 14:00:44 2015
New Revision: 228202

URL: https://gcc.gnu.org/viewcvs?rev=228202&root=gcc&view=rev
Log:
gcc/
	PR target/54236
	* config/sh/predicates.md (t_reg_operand, negt_reg_operand): Allow
	and handle ne and eq codes.
	* config/sh/sh.c (sh_rtx_costs): Adjust matching of tst #imm,r0 insn.
	(sh_recog_treg_set_expr): Early accept negt_reg_operand.  Eearly reject
	CONST_INT_P.  Use reverse_condition.
	(sh_split_treg_set_expr): Likewise.

gcc/testsuite/
	PR target/54236
	* gcc.target/sh/pr54236-1.c (test_09, test_10, test_11): New.
	* gcc.target/sh/pr59533-1.c (test_23, test_24, test_25, test_26,
	test_27): New.
	* gcc.target/sh/pr54236-5.c: New.
	* gcc.target/sh/pr54236-6.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr54236-5.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c

Comment 18 Oleg Endo 2015-10-01 12:36:47 UTC

Author: olegendo
Date: Thu Oct  1 12:36:15 2015
New Revision: 228332

URL: https://gcc.gnu.org/viewcvs?rev=228332&root=gcc&view=rev
Log:
gcc/testsuite/
	PR target/54236
	* gcc.target/sh/pr54236-6.c: Fix assembler-no string.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr54236-6.c

Comment 19 Martin Liška 2018-11-19 12:05:16 UTC

Can the bug be marked as resolved?

Comment 20 Oleg Endo 2018-11-19 12:59:16 UTC

Thanks for the reminder.  The issue hasn't been fully resolved. Please leave this PR open.  This will be helpful when getting back to the issue.