This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
Hello!
Attached patch improves code generation for _addcarry_* and
_subborrow_* intrinsics considerably. For the following testcase:
--cut here--
typedef unsigned long long u64;
u64 testcarry_u64 (u64 a, u64 b, u64 c, u64 d)
{
u64 result0, result1;
__builtin_ia32_addcarryx_u64
(__builtin_ia32_addcarryx_u64 (0, a, c, &result0), b, d, &result1);
return result0 ^ result1;
}
--cut here--
unpatched compiler generates:
xorl %r8d, %r8d
addb $-1, %r8b
adcq %rdi, %rdx
setc %r8b
movq %rdx, %rax
addb $-1, %r8b
adcq %rsi, %rcx
xorq %rcx, %rax
ret
and patched compiler improves the compiled code to:
addq %rdi, %rdx
adcq %rsi, %rcx
movq %rdx, %rax
xorq %rcx, %rax
ret
To achieve this improvement, patterns involving UNSPECs were removed
and equivalent add<mode>3_cconly_overflow patterns are used instead.
Also, to enable desired combine simplifications, several patterns were
rewritten to their canonical representation, similar to:
(define_insn "addcarry<mode>"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(plus:SWI48
(plus:SWI48
(match_operator:SWI48 4 "ix86_carry_flag_operator"
[(match_operand 3 "flags_reg_operand") (const_int 0)])
(match_operand:SWI48 1 "nonimmediate_operand" "%0"))
(match_operand:SWI48 2 "nonimmediate_operand" "rm"))
(match_dup 1)))
(set (match_operand:SWI48 0 "register_operand" "=r")
(plus:SWI48 (plus:SWI48 (match_op_dup 4
[(match_dup 3) (const_int 0)])
(match_dup 1))
(match_dup 2)))]
"ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
"adc{<imodesuffix>}\t{%2, %0|%0, %2}"
[(set_attr "type" "alu")
(set_attr "use_carry" "1")
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
and
(define_insn "subborrow<mode>"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(match_operand:SWI48 1 "nonimmediate_operand" "0")
(plus:SWI48
(match_operator:SWI48 4 "ix86_carry_flag_operator"
[(match_operand 3 "flags_reg_operand") (const_int 0)])
(match_operand:SWI48 2 "nonimmediate_operand" "rm"))))
(set (match_operand:SWI48 0 "register_operand" "=r")
(minus:SWI48 (minus:SWI48 (match_dup 1)
(match_op_dup 4
[(match_dup 3) (const_int 0)]))
(match_dup 2)))]
"ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
"sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
[(set_attr "type" "alu")
(set_attr "use_carry" "1")
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
Also, the patch removes generation of adcx instructions. I'd argue
that there is no point to generate larger and probably slower adcx
instead of equivalent shorter and well optimized adc[lq] instructions.
(If this is not the case, the follow-up patch to generate correct insn
mnemonic depending on TARGET_ADX would be trivial).
Please also note that unpatched expander breaks carry flag chains. The
rewritten expander takes care not to expand arguments (expansion can
result in carry flag clobbering add insns) in the middle of carry flag
chain. Due to this problem, the patch will also be backported to gcc-5
branch.
2015-08-27 Uros Bizjak <ubizjak@gmail.com>
PR target/67317
* config/i386/i386.md (*add<mode>3_cc): Remove insn pattern.
(addqi3_cc): Ditto.
(UNSPEC_ADD_CARRY): Remove.
(addqi3_cconly_overflow): New expander.
(*add<dwi>3_doubleword): Split to add<mode>3_cconly_overflow.
Adjust for changed add<mode>3_carry.
(*neg<dwi>2_doubleword): Adjust for changed add<mode>3_carry.
(*sub<dwi>3_doubleword): Adjust for changed sub<mode>3_carry.
(<plusminus_insn><mode>3_carry): Remove expander.
(*<plusminus_insn><mode>3_carry): Split insn pattern to
add<mode>3_carry and sub<mode>3_carry.
(plusminus_carry_mnemonic): Remove code attribute.
(add<mode>3_carry): Canonicalize insn pattern.
(*addsi3_carry_zext): Ditto.
(sub<mode>3_carry): Ditto.
(*subsi3_carry_zext): Ditto.
(adcx<mode>3): Remove insn pattern.
(addcarry<mode>): New insn pattern.
(subborrow<mode>): Ditto.
* config/i386/i386.c (ix86_expand_strlensi_unroll_1): Use
gen_addqi3_cconly_overflow instead of gen_addqi3_cc.
(ix86_expand_builtin) <case IX86_BUILTIN_SBB32,
case IX86_BUILTIN_SBB64, case IX86_BUILTIN_ADDCARRY32,
case IX86_BUILTIN_ADDCARRY64>: Use CODE_FOR_subborrowsi,
CODE_FOR_subborrowdi, CODE_FOR_addcarrysi and CODE_FOR_addcarrydi.
Rewrite expander to not clobber carry flag chains.
testsuite/ChangeLog:
2015-08-27 Uros Bizjak <ubizjak@gmail.com>
PR target/67317
* gcc.target/i386/pr67317-1.c: New test.
* gcc.target/i386/pr67317-2.c: Ditto.
* gcc.target/i386/pr67317-3.c: Ditto.
* gcc.target/i386/pr67317-4.c: Ditto.
* gcc.target/i386/adx-addcarryx32-1.c: Also scan for adcl.
* gcc.target/i386/adx-addcarryx32-2.c: Also scan for adcq.
Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Patch was committed to mainline, and will be backported to gcc-5
branch after a week without problems in mainline.
Uros.
Attachment:
p.diff.txt
Description: Text document
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |