There are currently a couple of cases, where it would be better if addc or subc insns were used. For example: int test00 (int a, int b) { return a + b + 1; } gets compiled to: mov r4,r0 ! MT add r5,r0 ! EX rts add #1,r0 ! EX could be better as: mov r4,r0 ! MT sett r5,r0 ! MT (SH4) rts addc #1,r0 ! EX As a proof of concept, I've applied the following to handle the above case: Index: gcc/config/sh/sh.md =================================================================== --- gcc/config/sh/sh.md (revision 190326) +++ gcc/config/sh/sh.md (working copy) @@ -1465,7 +1465,7 @@ (define_insn "addc" [(set (match_operand:SI 0 "arith_reg_dest" "=r") - (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "0") + (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0") (match_operand:SI 2 "arith_reg_operand" "r")) (reg:SI T_REG))) (set (reg:SI T_REG) @@ -1516,6 +1516,24 @@ "add %2,%0" [(set_attr "type" "arith")]) +(define_insn_and_split "*addsi3_compact" + [(set (match_operand:SI 0 "arith_reg_dest" "") + (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "") + (match_operand:SI 2 "arith_reg_operand" "")) + (const_int 1))) + (clobber (reg:SI T_REG))] + "TARGET_SH1" + "#" + "&& 1" + [(set (reg:SI T_REG) (const_int 1)) + (parallel [(set (match_dup 0) + (plus:SI (plus:SI (match_dup 1) + (match_dup 2)) + (reg:SI T_REG))) + (set (reg:SI T_REG) + (ltu:SI (plus:SI (match_dup 1) (match_dup 2)) + (match_dup 1)))])]) + ;; ------------------------------------------------------------------------- ;; Subtraction instructions ;; ------------------------------------------------------------------------- .. and observed some code from the CSiBE set for -O2 -m4-single -ml -mpretend-cmove. It doesn't affect code size that much (some incs/decs here and there), but more importantly it does this (libmpeg2/motion_comp.c): _MC_avg_o_16_c: --> mov.b @r5,r1 mov.b @r5,r2 .L16: .L16: mov.b @r4,r2 sett extu.b r1,r1 mov.b @r4,r1 extu.b r2,r2 extu.b r2,r2 add r2,r1 extu.b r1,r1 add #1,r1 addc r2,r1 shar r1 shar r1 mov.b r1,@r4 mov.b r1,@r4 mov.b @(1,r5),r0 sett extu.b r0,r1 mov.b @(1,r5),r0 mov.b @(1,r4),r0 extu.b r0,r1 extu.b r0,r0 mov.b @(1,r4),r0 add r0,r1 extu.b r0,r0 add #1,r1 addc r1,r0 shar r1 shar r0 mov r1,r0 mov.b r0,@(1,r4) mov.b r0,@(1,r4) In such cases, the sett,addc sequence can be scheduled much better and in most cases the sett insn can be executed in parallel with some other insn. Unfortunately, on SH4A the sett insn has been moved from MT group to EX group, but still it seems beneficial. I've also seen a couple of places, where sett-subc sequences would be better.
Author: olegendo Date: Thu Aug 16 23:19:45 2012 New Revision: 190459 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190459 Log: PR target/54236 * config/sh/sh.md (addc): Add commutative modifier. (*addc, *minus_plus_one, *subc, *negc): New insns and splits. PR target/54236 * gcc.target/sh/pr54236-1.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
Author: olegendo Date: Wed Sep 19 17:45:37 2012 New Revision: 191489 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191489 Log: PR target/54236 * config/sh/sh.md (*addc): Add pattern to handle one bit left shifts. PR target/54236 * gcc.target/sh/pr54236-1.c (test_08): Add one bit left shift case. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
Some more addc candidates I ran into: int test00 (int a, int b, int c, int d) { return (d & 1) + a + b; } -O2 -m4: mov r7,r0 and #1,r0 add r4,r0 rts add r5,r0 better: shlr r7 // T = r7 & 1 addc r4,r5 rts mov r5,r0 int test01 (int a, int b, int c, int d) { return ((d >> 31) & 1) + a + b; } -O2 -m4: shll r7 movt r0 add r4,r0 rts add r5,r0 better: shll r7 // T = (r7 >> 31) & 1 addc r4,r5 rts mov r5,r0
Author: olegendo Date: Tue Oct 29 20:45:56 2013 New Revision: 204180 URL: http://gcc.gnu.org/viewcvs?rev=204180&root=gcc&view=rev Log: PR target/54236 * config/sh/sh.md (*addc): Rename existing variations to ... (*addc_r_r_1, *addc_2r_1, *addc_r_1): ... these. (*addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_2r_lsb, *addc_r_msb, *addc_r_r_msb, *addc_2r_msb): New insn_and_split patterns. * config/sh/sh.c (addsubcosts): Handle some addc special cases. PR target/54236 * gcc.target/sh/pr54236-2: New. * gcc.target/sh/pr54089-6: Add another rotl special case. Added: trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54089-6.c
Created attachment 31141 [details] widening unsigned addition The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned addition, and to do some sort of unsigned overflow checks. For example: unsigned long long test_1 (unsigned int a, unsigned int b) { // 1x addc, 1x clrt, 1x movt, no tst return (unsigned long long)a + (unsigned long long)b; } minimal code: mov r5,r1 clrt addc r4,r1 rts movt r0 unsigned int test_2 (unsigned int a, unsigned int b) { // 1x addc, 1x clrt, no movt, no tst unsigned long long x = (unsigned long long)a + (unsigned long long)b; if (x >> 32) return 0xFFFFFFFF; else return (unsigned int)x; } minimal code (branchy, generic): clrt addc r4,r5 bt/s .L6 mov #-1,r0 mov r5,r0 .L6: rts nop minimal code (branchless, special case for constant 0xFFFFFFFF): clrt addc r4,r5 subc r1,r1 // T = 0: r1 = 0x00000000 // T = 1: r1 = 0xFFFFFFFF or r1,r5 rts mov r5,r0 The attached patch makes the compiler output the minimal code sequences, except for the branchless version of test_2. It would require folding basic blocks which combine won't do. The patch enables the "zero_extendsidi2" insn on non-SHMedia. Without that insn, the following is expanded: (insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ]) (const_int 0 [0])) sh_tmp.cpp:22 -1 (nil)) (insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4) (reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie} (expr_list:REG_DEAD (reg:SI 4 r4 [ a ]) (nil))) (insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0) (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie} (nil)) (insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ]) (const_int 0 [0])) sh_tmp.cpp:22 -1 (nil)) (insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4) (reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie} (expr_list:REG_DEAD (reg:SI 5 r5 [ b ]) (nil))) (insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0) (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie} (nil)) ... and combine doesn't seem to understand this and thus will never try out things such as (plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand")) (zero_extend:DI (match_operand:SI 2 "arith_reg_operand")))) The patch has a downside, though. It makes 64 bit arithmetic slightly worse in some cases (according to CSiBE). As far as I can see it, there are mainly two issues: - loads of constant #0 are not hoisted/CSE'd - some 64 bit highword optimizations are not applied anymore, in particular comparisons of zero extended unsigned SImode and unsigned DImode. The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern 'reserve' the constant #0 load, as it is done for some other insns: -(define_insn "zero_extendsidi2" +(define_expand "zero_extendsidi2" + [(set (match_operand:DI 0 "arith_reg_dest") + (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))] + "TARGET_SHMEDIA || TARGET_SH1" +{ + if (TARGET_SH1) + { + if (can_create_pseudo_p ()) + { + rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode)); + emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1], + zero_val)); + DONE; + } + else + FAIL; + } +}) + +(define_insn "*zero_extendsidi2" [(set (match_operand:DI 0 "arith_reg_dest" "=r") (zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))] "TARGET_SHMEDIA" @@ -5905,6 +5943,20 @@ [(set_attr "type" "arith_media") (set_attr "highpart" "extend")]) +(define_insn_and_split "zero_extendsidi2_compact" + [(set (match_operand:DI 0 "arith_reg_dest") + (zero_extend:DI (match_operand:SI 1 "extend_reg_operand"))) + (use (match_operand:SI 2 "arith_reg_operand"))] + "TARGET_SH1" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] +{ + emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]); + emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]); + DONE; +}) + However, maybe it'd be better to have a more generic solution to the constant load/sharing problem after combine (see PR 51708 comment #3). As for the comparisons, additional patterns would be required, which could be part of PR 52898.
Created attachment 31144 [details] stitching addc insns The attached patch is an example that shows how widening additions can be stitched together. One application would be arithmetic on user defined integer types with an arbitrary number of bits. For example (requires c++11): template <unsigned int Bits> class integer { public: typedef unsigned int word_type; typedef unsigned long long ext_word_type; static constexpr unsigned int bit_count = Bits; static constexpr unsigned int word_bit_count = sizeof (word_type) * 8; static constexpr unsigned int word_count = (bit_count + word_bit_count - 1) / word_bit_count; private: word_type word[word_count]; public: friend integer operator + (const integer& a, const integer& b) { integer result; word_type carry = 0; for (unsigned int i = 0; i < word_count; ++i) { auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry; result.word[i] = (word_type)sum; carry = (sum >> word_bit_count) == 0 ? 0 : 1; } return result; } }; With this patch the following examples, compiled with -funroll-all-loops -O2 demonstrate the effect: -------------------------- integer<64> test_02 (const integer<64>& a, const integer<64>& b) { return a + b; } mov.l @r5,r1 mov.l @r4,r0 clrt mov.l @(4,r5),r2 addc r1,r0 mov.l @(4,r4),r1 rts addc r2,r1 this is the same as a 'native' 64 bit addition. -------------------------- integer<80> test_03 (const integer<80>& a, const integer<80>& b) { return a + b; } mov.l @r5,r3 mov.l @r4,r1 clrt mov.l @(4,r5),r0 mov.l @(4,r4),r6 addc r3,r1 mov.l @(8,r5),r5 mov.l @(8,r4),r4 addc r0,r6 mov.l r1,@r2 mov r2,r0 addc r5,r4 mov.l r6,@(4,r2) rts mov.l r4,@(8,r2) 80 bits are rounded up to 96 in the template 'integer', thus 3 addc insns are required to do the 96 bit addition. However, when compiling without loop unrolling, it doesn't work because there is no mechanism to feed back the carry variable inside a loop. The test_03 function becomes: mov.l r8,@-r15 mov.l r9,@-r15 mov.l r10,@-r15 mov.l r11,@-r15 mov #0,r0 mov #0,r10 .L4: mov.l @(r0,r4),r3 clrt mov.l @(r0,r5),r1 mov r10,r7 // r7 = carry from previous iteration mov #0,r6 mov r1,r9 addc r3,r9 // r9 = a[i] + b[i] (lsw of 64 bit result) movt r10 // r10 = carry (msw of 64 bit result) clrt mov r9,r11 addc r7,r11 // r11 = r9 + previous carry (lsw of 64 bit result) addc r6,r10 // r10 = carry for next iteration mov.l r11,@(r0,r2) add #4,r0 cmp/eq #12,r0 bf .L4 mov.l @r15+,r11 mov r2,r0 mov.l @r15+,r10 mov.l @r15+,r9 rts mov.l @r15+,r8 it could be something like this: mov #0,r0 mov #0,r6 .L4: mov.l @(r0,r4),r3 mov.l @(r0,r5),r1 cmp/pl r6 // T = r6 > 0 (get carry into T bit) addc r3,r1 // r1 = a[i] + b[i] + T (lsw of 64 bit result) movt r6 // r6 = new carry (msw of 64 bit result) mov.l r1,@(r0,r2) add #4,r0 cmp/eq #12,r0 bf .L4 rts mov r2,r0 However, that would require some loop analysis in order to discover the T bit feedback opportunity.
(In reply to Oleg Endo from comment #6) > Created attachment 31144 [details] > stitching addc insns > > The attached patch is an example that shows how widening additions can be > stitched together. One application would be arithmetic on user defined > integer types with an arbitrary number of bits. > For example (requires c++11): > > template <unsigned int Bits> class integer > { > public: > typedef unsigned int word_type; > typedef unsigned long long ext_word_type; > > static constexpr unsigned int bit_count = Bits; > static constexpr unsigned int word_bit_count = sizeof (word_type) * 8; > static constexpr unsigned int word_count = (bit_count + word_bit_count - 1) > / word_bit_count; > > private: > word_type word[word_count]; > > public: > friend integer > operator + (const integer& a, const integer& b) > { > integer result; > > word_type carry = 0; > for (unsigned int i = 0; i < word_count; ++i) > { > auto sum = (ext_word_type)a.word[i] + (ext_word_type)b.word[i] + carry; > result.word[i] = (word_type)sum; > carry = (sum >> word_bit_count) == 0 ? 0 : 1; > } > > return result; > } > }; > I forgot to mention that the patch works only if the type of the carry variable is 'unsigned int' or 'unsigned long long'. If a signed or boolean type is used some weird things seem to happen and there are some sign extensions somewhere around the T bit.
Author: olegendo Date: Wed May 21 08:06:06 2014 New Revision: 210682 URL: http://gcc.gnu.org/viewcvs?rev=210682&root=gcc&view=rev Log: gcc/ PR target/54236 * config/sh/sh.md (*addc_r_1): Rename to addc_t_r. Remove empty constraints. (*addc_r_t): Add new insn_and_split. gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-3.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
The following function compiled with -O2 unsigned int check (unsigned int x) { return x == 0 ? 1 : x; } results in: tst r4,r4 bt/s .L7 mov #1,r0 mov r4,r0 .L7: rts nop Writing it as: unsigned int check (unsigned int x) { return x + (x == 0); } results in: tst r4,r4 mov #0,r0 rts addc r4,r0 It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name pattern. If the 2nd operand of the conditional addition is a constant 1 or -1 the addcc insn can be implemented via addc or subc without a branch. If the comparison can't be done in one insn (e.g. x != 0) and requires T bit negation/inversion: unsigned int check (unsigned int x) { return x + (x != 0); } results in (non-SH2A): tst r4,r4 mov #-1,r0 negc r0,r0 rts add r4,r0 slightly better (one arith insn less): tst r4,r4 mov #-1,r0 subc r0,r4 // r4 = r4 - (-1) - T = r4 + 1 - T rts mov r4,r0 and on SH2A (already minimal): tst r4,r4 movrt r0 rts add r4,r0 subc case: unsigned int check (unsigned int x) { return x - (x != 0); } non-SH2A: tst r4,r4 mov #-1,r1 mov r4,r0 negc r1,r1 rts sub r1,r0 // r0 = r4 - (1 - T) = r4 - 1 + T SH2A: tst r4,r4 movrt r1 mov r4,r0 rts sub r1,r0 better (non-SH2A and SH2A): tst r4,r4 mov #-1,r0 rts addc r4,r0 // r0 = -1 + r4 + T = r4 - 1 + T The equivalent conditional version should result in the same (currently it isn't): unsigned int check (unsigned int x) { return x != 0 ? (x - 1) : x; } Since ifcvt is also ran after combine, combine won't get the chance to combine patterns such as: (set (reg:SI 165 [ D.1635 ]) (plus:SI (xor:SI (reg:SI 147 t) (const_int 1 [0x1])) (reg/v:SI 164 [ x ]))) which would occur after emitting a comparison, negating the result (T bit) and then adding/subtracting the T bit via addc/subc. If that happens, the combine must be done manually in the split pass as it's done with some of the other insns.
Author: olegendo Date: Sat Jan 24 13:04:53 2015 New Revision: 220081 URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev Log: gcc/ PR target/49263 PR target/53987 PR target/64345 PR target/59533 PR target/52933 PR target/54236 PR target/51244 * config/sh/sh-protos.h (sh_extending_set_of_reg::can_use_as_unextended_reg, sh_extending_set_of_reg::use_as_unextended_reg, sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn, sh_is_movrt_insn, sh_insn_operands_modified_between_p, sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr, sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions. (sh_treg_insns): New class. * config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook. (scope_counter): New class. (sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn, sh_extending_set_of_reg::can_use_as_unextended_reg, sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr, sh_in_recog_treg_set_expr, sh_try_split_insn_simple, sh_split_treg_set_expr): New functions. (addsubcosts): Handle treg_set_expr. (sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT. (sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND. (sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases. (sh_insn_operands_modified_between_p): Make non-static. * config/sh/predicates.md (zero_extend_movu_operand): Allow simple_mem_operand in addition to displacement_mem_operand. (zero_extend_operand): Don't allow zero_extend_movu_operand. (treg_set_expr, treg_set_expr_not_const01, arith_reg_or_treg_set_expr): New predicates. * config/sh/sh.md (tstsi_t): Use arith_reg_operand and arith_or_int_operand instead of logical_operand. Convert to insn_and_split. Try to optimize constant operand in splitter. (tsthi_t, tstqi_t): Fold into *tst<mode>_t. Convert to insn_and_split. (*tstqi_t_zero): Delete. (*tst<mode>_t_subregs): Add !sh_in_recog_treg_set_expr split condition. (tstsi_t_and_not): Delete. (tst<mode>_t_zero_extract_eq): Rename to *tst<mode>_t_zero_extract. Convert to insn_and_split. (unnamed split, tstsi_t_zero_extract_xor, tstsi_t_zero_extract_subreg_xor_little, tstsi_t_zero_extract_subreg_xor_big): Delete. (*tstsi_t_shift_mask): New insn_and_split. (cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try to recombine with surrounding insns when splitting. (*negtstsi): Add !sh_in_recog_treg_set_expr condition. (cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ... (cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4, *cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns. (*cbranch_div0s: Delete. (*addc): Convert to insn_and_split. Use treg_set_expr as 3rd operand. Try to recombine with surrounding insns when splitting. Add operand order variants. (*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01. (*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb, *addc_r_r_msb, *addc_2r_msb): Delete. (*addc_2r_lsb): Rename to *addc_2r_t. Use treg_set_expr. Add operand order variant. (*addc_negreg_t): New insn_and_split. (*subc): Convert to insn_and_split. Use treg_set_expr as 3rd operand. Try to recombine with surrounding insns when splitting. Add operand order variants. (*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New insn_and_split patterns. (*rotcr): Use arith_reg_or_treg_set_expr. Try to recombine with surrounding insns when splitting. (unnamed rotcr split): Use arith_reg_or_treg_set_expr. (*rotcl): Likewise. Add zero_extract variant. (*ashrsi2_31): New insn_and_split. (*negc): Convert to insn_and_split. Use treg_set_expr. (*zero_extend<mode>si2_disp_mem): Update comment. (movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split condition. (*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr. Try to recombine with surrounding insns when splitting. (any_treg_expr_to_reg): New insn_and_split. (*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2, *neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5, *neg_zero_extract_6, *zero_extract_0, *zero_extract_1, *zero_extract_2): New single bit zero extract patterns. (bld_reg, *bld_regqi): Fold into bld<mode>_reg. (*get_thread_pointersi, store_gbr, *mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load, *movdi_gbr_load): Use arith_reg_dest instead of register_operand for set destination. (set_thread_pointersi, load_gbr): Use arith_reg_operand instead of register_operand for set source. gcc/testsuite/ PR target/49263 PR target/53987 PR target/64345 PR target/59533 PR target/52933 PR target/54236 PR target/51244 * gcc.target/sh/pr64345-1.c: New. * gcc.target/sh/pr64345-2.c: New. * gcc.target/sh/pr59533-1.c: New. * gcc.target/sh/pr49263.c: Adjust matching of expected insns. * gcc.target/sh/pr52933-2.c: Likewise. * gcc.target/sh/pr54089-1.c: Likewise. * gcc.target/sh/pr54236-1.c: Likewise. * gcc.target/sh/pr51244-20-sh2a.c: Likewise. * gcc.target/sh/pr49263-1.c: Remove xfails. * gcc.target/sh/pr49263-2.c: Likewise. * gcc.target/sh/pr49263-3.c: Likewise. * gcc.target/sh/pr53987-1.c: Likewise. * gcc.target/sh/pr52933-1.c: Adjust matching of expected insns. (test_24, test_25, test_26, test_27, test_28, test_29, test_30): New. * gcc.target/sh/pr51244-12.c: Adjust matching of expected insns. (test05, test06, test07, test08, test09, test10, test11, test12): New. * gcc.target/sh/pr54236-3.c: Adjust matching of expected insns. (test_002, test_003, test_004, test_005, test_006, test_007, test_008, test_009): New. * gcc.target/sh/pr51244-4.c: Adjust matching of expected insns. (test_02): New. Added: trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c trunk/gcc/testsuite/gcc.target/sh/pr64345-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c trunk/gcc/testsuite/gcc.target/sh/pr49263-3.c trunk/gcc/testsuite/gcc.target/sh/pr49263.c trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c trunk/gcc/testsuite/gcc.target/sh/pr52933-1.c trunk/gcc/testsuite/gcc.target/sh/pr52933-2.c trunk/gcc/testsuite/gcc.target/sh/pr53987-1.c trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c
Author: olegendo Date: Sun Jan 25 16:41:25 2015 New Revision: 220093 URL: https://gcc.gnu.org/viewcvs?rev=220093&root=gcc&view=rev Log: gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-4.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54236-4.c Modified: trunk/gcc/testsuite/ChangeLog
Author: olegendo Date: Sun May 17 21:01:29 2015 New Revision: 223274 URL: https://gcc.gnu.org/viewcvs?rev=223274&root=gcc&view=rev Log: gcc/ PR target/54236 * config/sh/sh.md (*addc_2r_t): Use ashift instead of mult. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md
Author: olegendo Date: Tue May 19 08:00:41 2015 New Revision: 223346 URL: https://gcc.gnu.org/viewcvs?rev=223346&root=gcc&view=rev Log: gcc/ PR target/54236 * config/sh/sh.md (*round_int_even): New insn_and_split and accompanying new unnamed split. gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-2.c: Adjust expected insn counts. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c
Author: olegendo Date: Thu May 21 12:36:35 2015 New Revision: 223479 URL: https://gcc.gnu.org/viewcvs?rev=223479&root=gcc&view=rev Log: gcc/ PR target/54236 * config/sh/sh.md (*round_int_even): Reject pattern if operands[0] and operands[1] are the same. gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-2.c: Fix typo in comment. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54236-2.c
The following shows missed subc cases when there are constants involved. Addc cases can be constructed in the same way. int fun (int x) { return x - 1 - (x > 100); } -O2 -m4: mov #100,r1 mov r4,r0 cmp/gt r1,r4 movt r1 sett rts subc r1,r0 better: mov #100,r1 mov r4,r0 cmp/gt r1,r4 mov #1,r1 rts subc r1,r0 Combine is looking for a pattern like: (parallel [ (set (reg:SI 168 [ D.1659 ]) (plus:SI (not:SI (gt:SI (reg/v:SI 167 [ x ]) (reg:SI 172))) (reg/v:SI 167 [ x ]))) (clobber (reg:SI 147 t)) ]) constants -1 and -2 are represented as not (0) and not (1) in this case. int fun (int x) { return x - 10 - (x > 100); } -O2 -m4: mov #100,r1 mov r4,r0 cmp/gt r1,r4 add #-10,r0 mov #0,r1 rts subc r1,r0 better: mov #100,r1 mov r4,r0 cmp/gt r1,r4 mov #10,r1 rts subc r1,r0 Combine is looking for a pattern like Failed to match this instruction: (parallel [ (set (reg:SI 168 [ D.1659 ]) (plus:SI (minus:SI (reg/v:SI 167 [ x ]) (gt:SI (reg/v:SI 167 [ x ]) (reg:SI 172))) (const_int -10 [0xfffffffffffffff6]))) (clobber (reg:SI 147 t)) ]) there is already a similar pattern *subc_negreg_t
Created attachment 36012 [details] addsicc pattern (In reply to Oleg Endo from comment #9) > The following function compiled with -O2 > > unsigned int check (unsigned int x) > { > return x == 0 ? 1 : x; > } > > results in: > tst r4,r4 > bt/s .L7 > mov #1,r0 > mov r4,r0 > .L7: > rts > nop > > > Writing it as: > unsigned int check (unsigned int x) > { > return x + (x == 0); > } > > results in: > tst r4,r4 > mov #0,r0 > rts > addc r4,r0 > > It seems that ifcvt is trying to utilize the 'add<mode>cc' standard name > pattern. If the 2nd operand of the conditional addition is a constant 1 or > -1 the addcc insn can be implemented via addc or subc without a branch. The attached patch adds support for the addsicc pattern and a few other improvements. However, the first case above doesn't see any improvement. It seems that it's a missed ifcvt optimization.
Author: olegendo Date: Mon Sep 28 14:00:44 2015 New Revision: 228202 URL: https://gcc.gnu.org/viewcvs?rev=228202&root=gcc&view=rev Log: gcc/ PR target/54236 * config/sh/predicates.md (t_reg_operand, negt_reg_operand): Allow and handle ne and eq codes. * config/sh/sh.c (sh_rtx_costs): Adjust matching of tst #imm,r0 insn. (sh_recog_treg_set_expr): Early accept negt_reg_operand. Eearly reject CONST_INT_P. Use reverse_condition. (sh_split_treg_set_expr): Likewise. gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-1.c (test_09, test_10, test_11): New. * gcc.target/sh/pr59533-1.c (test_23, test_24, test_25, test_26, test_27): New. * gcc.target/sh/pr54236-5.c: New. * gcc.target/sh/pr54236-6.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54236-5.c trunk/gcc/testsuite/gcc.target/sh/pr54236-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c
Author: olegendo Date: Thu Oct 1 12:36:15 2015 New Revision: 228332 URL: https://gcc.gnu.org/viewcvs?rev=228332&root=gcc&view=rev Log: gcc/testsuite/ PR target/54236 * gcc.target/sh/pr54236-6.c: Fix assembler-no string. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54236-6.c
Can the bug be marked as resolved?
Thanks for the reminder. The issue hasn't been fully resolved. Please leave this PR open. This will be helpful when getting back to the issue.