Bug 67317 - [x86] Silly code generation for _addcarry_u32/_addcarry_u64
Summary: [x86] Silly code generation for _addcarry_u32/_addcarry_u64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.2.0
: P3 minor
Target Milestone: 5.3
Assignee: Uroš Bizjak
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-08-22 04:59 UTC by Melissa
Modified: 2018-03-24 19:25 UTC (History)
5 users (show)

See Also:
Host:
Target: x86
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-08-25 00:00:00


Attachments
Proposed patch (2.91 KB, patch)
2015-08-27 08:53 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Melissa 2015-08-22 04:59:21 UTC
x86 intrinsics _addcarry_u32 and _addcarry_u64 generate silly code.  For example, the following function to get the result of a 64-bit addition (the XOR is to the output clearer):

	u64 testcarry(u64 a, u64 b, u64 c, u64 d)
	{
		u64 result0, result1;
		_addcarry_u64(_addcarry_u64(0, a, c, &result0), b, d, &result1);
		return result0 ^ result1;
	}

This is the code generated with -O1, -O2 and -O3:

	xor	r8d, r8d
	add	r8b, -1
	adc	rdx, rdi
	setc	r8b
	mov	rax, rdx
	add	r8b, -1
	adc	rcx, rsi
	xor	rax, rcx
	ret

The first sillyness is that _addcarry_u64 does not optimize a compile-time constant 0 being the first carry parameter.  Instead of "adc", it should just use "add".

The second sillyness is with the use of r8b to store the carry flag, then using "add r8b, -1" to put the result back into carry.

Instead, the code should be something like this:

	add	rdx, rdi
	mov	rax, rdx
	adc	rcx, rsi
	xor	rax, rcx
	ret

Naturally, for something this simple, I'd use unsigned __int128, but this came up in large number math.
Comment 1 Andrew Pinski 2015-08-22 06:14:14 UTC
Can you provide the full preprocessed source?
Comment 2 Marc Glisse 2015-08-25 11:10:31 UTC
Self-contained:

typedef unsigned long long u64;
u64 testcarry(u64 a, u64 b, u64 c, u64 d)
{
  u64 result0, result1;
  __builtin_ia32_addcarryx_u64(__builtin_ia32_addcarryx_u64(0, a, c, &result0), b, d, &result1);
  return result0 ^ result1;
}
Comment 3 Segher Boessenkool 2015-08-25 12:58:50 UTC
These things would normally be taken care of by combine, but
combine of course does not know what the UNSPEC_ADD_CARRY's mean.
Does this need to be an unspec at all?
Comment 4 Uroš Bizjak 2015-08-25 16:48:06 UTC
(In reply to Segher Boessenkool from comment #3)

> Does this need to be an unspec at all?

Of course not. We are looking to replace unspecs with standard RTXes. Do you have any recommendation on how we can represent this carry-setting insn to satisfy combine?
Comment 5 Segher Boessenkool 2015-08-25 19:25:57 UTC
Combine can handle most RTL expressions, although it sometimes
simplifies (or "simplifies") more than you want.

I think in this case what is already done in *add<mode>3_cc_overflow
will work well, but I do not know x86 CC modes terribly well.  You'll
need to experiment a bit to find something that works well in all
interesting cases.
Comment 6 Uroš Bizjak 2015-08-27 08:53:06 UTC
Created attachment 36258 [details]
Proposed patch

This patch changes expanders of carry-handling builtins to use addqi3_cconly_overflow and canonicalizes carry hanling insn to what combine can process:

(define_insn "addcarry<mode>"
  [(set (reg:CCC FLAGS_REG)
	(compare:CCC
	  (plus:SWI48
	    (plus:SWI48
	      (match_operator:SWI48 4 "ix86_carry_flag_operator"
	       [(match_operand 3 "flags_reg_operand") (const_int 0)])
	      (match_operand:SWI48 1 "nonimmediate_operand" "%0"))
	    (match_operand:SWI48 2 "nonimmediate_operand" "rm"))
	  (match_dup 1)))
   (set (match_operand:SWI48 0 "register_operand" "=r")
	(plus:SWI48 (plus:SWI48 (match_op_dup 4
				 [(match_dup 3) (const_int 0)])
				(match_dup 1))
		    (match_dup 2)))]
  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "use_carry" "1")
   (set_attr "pent_pair" "pu")
   (set_attr "mode" "<MODE>")])

and

(define_insn "subborrow<mode>"
  [(set (reg:CCC FLAGS_REG)
	(compare:CCC
	  (match_operand:SWI48 1 "nonimmediate_operand" "0")
	  (plus:SWI48
	    (match_operator:SWI48 4 "ix86_carry_flag_operator"
	     [(match_operand 3 "flags_reg_operand") (const_int 0)])
	    (match_operand:SWI48 2 "nonimmediate_operand" "rm"))))
   (set (match_operand:SWI48 0 "register_operand" "=r")
	(minus:SWI48 (minus:SWI48 (match_dup 1)
				  (match_op_dup 4
				   [(match_dup 3) (const_int 0)]))
		     (match_dup 2)))]
  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "use_carry" "1")
   (set_attr "pent_pair" "pu")
   (set_attr "mode" "<MODE>")])

The patch also rewrites expander to fix a bug, where carry-clobbering insns were emitted inside carry-flag def-use chain.

For the testcase, patched gcc generates:

testcarry_u64:
        addq    %rdi, %rdx
        adcq    %rsi, %rcx
        movq    %rdx, %rax
        xorq    %rcx, %rax
        ret
Comment 7 uros 2015-08-27 18:30:08 UTC
Author: uros
Date: Thu Aug 27 18:29:37 2015
New Revision: 227271

URL: https://gcc.gnu.org/viewcvs?rev=227271&root=gcc&view=rev
Log:
	PR target/67317
	* config/i386/i386.md (*add<mode>3_cc): Remove insn pattern.
	(addqi3_cc): Ditto.
	(UNSPEC_ADD_CARRY): Remove.
	(addqi3_cconly_overflow): New expander.
	(*add<dwi>3_doubleword): Split to add<mode>3_cconly_overflow.
	Adjust for changed add<mode>3_carry.
	(*neg<dwi>2_doubleword): Adjust for changed add<mode>3_carry.
	(*sub<dwi>3_doubleword): Adjust for changed sub<mode>3_carry.
	(<plusminus_insn><mode>3_carry): Remove expander.
	(*<plusminus_insn><mode>3_carry): Split insn pattern to
	add<mode>3_carry and sub<mode>3_carry.
	(plusminus_carry_mnemonic): Remove code attribute.
	(add<mode>3_carry): Canonicalize insn pattern.
	(*addsi3_carry_zext): Ditto.
	(sub<mode>3_carry): Ditto.
	(*subsi3_carry_zext): Ditto.
	(adcx<mode>3): Remove insn pattern.
	(addcarry<mode>): New insn pattern.
	(subborrow<mode>): Ditto.
	* config/i386/i386.c (ix86_expand_strlensi_unroll_1): Use
	gen_addqi3_cconly_overflow instead of gen_addqi3_cc.
	(ix86_expand_builtin) <case IX86_BUILTIN_SBB32,
	case IX86_BUILTIN_SBB64, case IX86_BUILTIN_ADDCARRY32,
	case IX86_BUILTIN_ADDCARRY64>: Use CODE_FOR_subborrowsi,
	CODE_FOR_subborrowdi, CODE_FOR_addcarrysi and CODE_FOR_addcarrydi.
	Rewrite expander to not clobber carry flag chains.

testsuite/ChangeLog:

	PR target/67317
	* gcc.target/i386/pr67317-1.c: New test.
	* gcc.target/i386/pr67317-2.c: Ditto.
	* gcc.target/i386/pr67317-3.c: Ditto.
	* gcc.target/i386/pr67317-4.c: Ditto.
	* gcc.target/i386/adx-addcarryx32-1.c: Also scan for adcl.
	* gcc.target/i386/adx-addcarryx32-2.c: Also scan for adcq.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr67317-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr67317-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr67317-3.c
    trunk/gcc/testsuite/gcc.target/i386/pr67317-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/adx-addcarryx32-1.c
    trunk/gcc/testsuite/gcc.target/i386/adx-addcarryx64-1.c
Comment 8 uros 2015-09-02 15:07:28 UTC
Author: uros
Date: Wed Sep  2 15:06:56 2015
New Revision: 227405

URL: https://gcc.gnu.org/viewcvs?rev=227405&root=gcc&view=rev
Log:
	Backport from mainline:
	2015-08-27  Uros Bizjak  <ubizjak@gmail.com>

	PR target/67317
	* config/i386/i386.md (*add<mode>3_cc): Remove insn pattern.
	(addqi3_cc): Ditto.
	(UNSPEC_ADD_CARRY): Remove.
	(addqi3_cconly_overflow): New expander.
	(*add<dwi>3_doubleword): Split to add<mode>3_cconly_overflow.
	Adjust for changed add<mode>3_carry.
	(*neg<dwi>2_doubleword): Adjust for changed add<mode>3_carry.
	(*sub<dwi>3_doubleword): Adjust for changed sub<mode>3_carry.
	(<plusminus_insn><mode>3_carry): Remove expander.
	(*<plusminus_insn><mode>3_carry): Split insn pattern to
	add<mode>3_carry and sub<mode>3_carry.
	(plusminus_carry_mnemonic): Remove code attribute.
	(add<mode>3_carry): Canonicalize insn pattern.
	(*addsi3_carry_zext): Ditto.
	(sub<mode>3_carry): Ditto.
	(*subsi3_carry_zext): Ditto.
	(adcx<mode>3): Remove insn pattern.
	(addcarry<mode>): New insn pattern.
	(subborrow<mode>): Ditto.
	* config/i386/i386.c (ix86_expand_strlensi_unroll_1): Use
	gen_addqi3_cconly_overflow instead of gen_addqi3_cc.
	(ix86_expand_builtin) <case IX86_BUILTIN_SBB32,
	case IX86_BUILTIN_SBB64, case IX86_BUILTIN_ADDCARRY32,
	case IX86_BUILTIN_ADDCARRY64>: Use CODE_FOR_subborrowsi,
	CODE_FOR_subborrowdi, CODE_FOR_addcarrysi and CODE_FOR_addcarrydi.
	Rewrite expander to not clobber carry flag chains.

testsuite/ChangeLog:

	Backport from mainline:
	2015-08-27  Uros Bizjak  <ubizjak@gmail.com>

	PR target/67317
	* gcc.target/i386/pr67317-1.c: New test.
	* gcc.target/i386/pr67317-2.c: Ditto.
	* gcc.target/i386/pr67317-3.c: Ditto.
	* gcc.target/i386/pr67317-4.c: Ditto.
	* gcc.target/i386/adx-addcarryx32-1.c: Also scan for adcl.
	* gcc.target/i386/adx-addcarryx32-2.c: Also scan for adcq.


Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr67317-1.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr67317-2.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr67317-3.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/pr67317-4.c
Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/i386/i386.c
    branches/gcc-5-branch/gcc/config/i386/i386.md
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/adx-addcarryx32-1.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/i386/adx-addcarryx64-1.c
Comment 9 Uroš Bizjak 2015-09-02 15:08:35 UTC
Fixed for 5.3+.
Comment 10 Jeffrey Walton 2017-08-21 11:47:19 UTC
I'm not sure why Bugzilla did not return this result when I searched for the instructions earlier. Adding search terms that might help others find this:

    ADCX
    adcx
    ADOX
    adox