This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, i386]: Fix PR67317, silly code generation for _addcarry/_subborrow intrinsics


Hello!

Attached patch improves code generation for _addcarry_* and
_subborrow_* intrinsics considerably. For the following testcase:

--cut here--
typedef unsigned long long u64;

u64 testcarry_u64 (u64 a, u64 b, u64 c, u64 d)
{
 u64 result0, result1;

 __builtin_ia32_addcarryx_u64
   (__builtin_ia32_addcarryx_u64 (0, a, c, &result0), b, d, &result1);

 return result0 ^ result1;
}
--cut here--

unpatched compiler generates:

       xorl    %r8d, %r8d
       addb    $-1, %r8b
       adcq    %rdi, %rdx
       setc    %r8b
       movq    %rdx, %rax
       addb    $-1, %r8b
       adcq    %rsi, %rcx
       xorq    %rcx, %rax
       ret

and patched compiler improves the compiled code to:

       addq    %rdi, %rdx
       adcq    %rsi, %rcx
       movq    %rdx, %rax
       xorq    %rcx, %rax
       ret

To achieve this improvement, patterns involving UNSPECs were removed
and equivalent add<mode>3_cconly_overflow patterns are used instead.
Also, to enable desired combine simplifications, several patterns were
rewritten to their canonical representation, similar to:

(define_insn "addcarry<mode>"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC
 (plus:SWI48
   (plus:SWI48
     (match_operator:SWI48 4 "ix86_carry_flag_operator"
      [(match_operand 3 "flags_reg_operand") (const_int 0)])
     (match_operand:SWI48 1 "nonimmediate_operand" "%0"))
   (match_operand:SWI48 2 "nonimmediate_operand" "rm"))
 (match_dup 1)))
   (set (match_operand:SWI48 0 "register_operand" "=r")
(plus:SWI48 (plus:SWI48 (match_op_dup 4
[(match_dup 3) (const_int 0)])
(match_dup 1))
   (match_dup 2)))]
  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "use_carry" "1")
   (set_attr "pent_pair" "pu")
   (set_attr "mode" "<MODE>")])

and

(define_insn "subborrow<mode>"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC
 (match_operand:SWI48 1 "nonimmediate_operand" "0")
 (plus:SWI48
   (match_operator:SWI48 4 "ix86_carry_flag_operator"
    [(match_operand 3 "flags_reg_operand") (const_int 0)])
   (match_operand:SWI48 2 "nonimmediate_operand" "rm"))))
   (set (match_operand:SWI48 0 "register_operand" "=r")
(minus:SWI48 (minus:SWI48 (match_dup 1)
 (match_op_dup 4
  [(match_dup 3) (const_int 0)]))
    (match_dup 2)))]
  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "use_carry" "1")
   (set_attr "pent_pair" "pu")
   (set_attr "mode" "<MODE>")])

Also, the patch removes generation of adcx instructions. I'd argue
that there is no point to generate larger and probably slower adcx
instead of equivalent shorter and well optimized adc[lq] instructions.
(If this is not the case, the follow-up patch to generate correct insn
mnemonic depending on TARGET_ADX would be trivial).

Please also note that unpatched expander breaks carry flag chains. The
rewritten expander takes care not to expand arguments (expansion can
result in carry flag clobbering add insns) in the middle of carry flag
chain. Due to this problem, the patch will also be backported to gcc-5
branch.

2015-08-27  Uros Bizjak  <ubizjak@gmail.com>

    PR target/67317
    * config/i386/i386.md (*add<mode>3_cc): Remove insn pattern.
    (addqi3_cc): Ditto.
    (UNSPEC_ADD_CARRY): Remove.
    (addqi3_cconly_overflow): New expander.
    (*add<dwi>3_doubleword): Split to add<mode>3_cconly_overflow.
    Adjust for changed add<mode>3_carry.
    (*neg<dwi>2_doubleword): Adjust for changed add<mode>3_carry.
    (*sub<dwi>3_doubleword): Adjust for changed sub<mode>3_carry.
    (<plusminus_insn><mode>3_carry): Remove expander.
    (*<plusminus_insn><mode>3_carry): Split insn pattern to
    add<mode>3_carry and sub<mode>3_carry.
    (plusminus_carry_mnemonic): Remove code attribute.
    (add<mode>3_carry): Canonicalize insn pattern.
    (*addsi3_carry_zext): Ditto.
    (sub<mode>3_carry): Ditto.
    (*subsi3_carry_zext): Ditto.
    (adcx<mode>3): Remove insn pattern.
    (addcarry<mode>): New insn pattern.
    (subborrow<mode>): Ditto.
    * config/i386/i386.c (ix86_expand_strlensi_unroll_1): Use
    gen_addqi3_cconly_overflow instead of gen_addqi3_cc.
    (ix86_expand_builtin) <case IX86_BUILTIN_SBB32,
    case IX86_BUILTIN_SBB64, case IX86_BUILTIN_ADDCARRY32,
    case IX86_BUILTIN_ADDCARRY64>: Use CODE_FOR_subborrowsi,
    CODE_FOR_subborrowdi, CODE_FOR_addcarrysi and CODE_FOR_addcarrydi.
    Rewrite expander to not clobber carry flag chains.

testsuite/ChangeLog:

2015-08-27  Uros Bizjak  <ubizjak@gmail.com>

    PR target/67317
    * gcc.target/i386/pr67317-1.c: New test.
    * gcc.target/i386/pr67317-2.c: Ditto.
    * gcc.target/i386/pr67317-3.c: Ditto.
    * gcc.target/i386/pr67317-4.c: Ditto.
    * gcc.target/i386/adx-addcarryx32-1.c: Also scan for adcl.
    * gcc.target/i386/adx-addcarryx32-2.c: Also scan for adcq.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Patch was committed to mainline, and will be backported to gcc-5
branch after a week without problems in mainline.

Uros.

Attachment: p.diff.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]