This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, ARM] correctly encode the CC reg data flow


On 01/13/17 19:28, Bernd Edlinger wrote:
> On 01/13/17 17:10, Bernd Edlinger wrote:
>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this is related to PR77308, the follow-up patch will depend on this
>>>> one.
>>>>
>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>>> was discovered, see [1] for more details.
>>>>
>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>> up into this:
>>>>
>>>>    [(set (reg:CC CC_REGNUM)
>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>                (set (match_dup 2)
>>>>                     (minus:SI (match_dup 5)
>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>> 0))))])]
>>>>
>>>>    [(set (reg:CC CC_REGNUM)
>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>                (set (reg:CC CC_REGNUM)
>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>
>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>> redundant and thus got removed, because the data values are identical.
>>>>
>>>> I think that applies to a number of similar pattern where data
>>>> flow is happening through the CC reg.
>>>>
>>>> So this is a kind of correctness issue, and should be fixed
>>>> independently from the optimization issue PR77308.
>>>>
>>>> Therefore I think the patterns need to specify the true
>>>> value that will be in the CC reg, in order for cse to
>>>> know what the instructions are really doing.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>> Is it OK for trunk?
>>>>
>>>
>>> I agree you've found a valid problem here, but I have some issues with
>>> the patch itself.
>>>
>>>
>>> (define_insn_and_split "subdi3_compare1"
>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>     (compare:CC_NCV
>>>       (match_operand:DI 1 "register_operand" "r")
>>>       (match_operand:DI 2 "register_operand" "r")))
>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>   "TARGET_32BIT"
>>>   "#"
>>>   "&& reload_completed"
>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>            (compare:CC_C
>>>              (zero_extend:DI (match_dup 4))
>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>           (set (match_dup 3)
>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>
>>>
>>> This pattern is now no-longer self consistent in that before the split
>>> the overall result for the condition register is in mode CC_NCV, but
>>> afterwards it is just CC_C.
>>>
>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>> reflect the result of the 64-bit comparison), but that then implies that
>>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>>> that CC_NCV is the correct mode for this operation
>>>
>>> I'm not sure if there are other consequences that will fall out from
>>> fixing this (it's possible that we might need a change to select_cc_mode
>>> as well).
>>>
>>
>> Yes, this is still a bit awkward...
>>
>> The N and V bit will be the correct result for the subdi3_compare1
>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>> only gets the C bit correct, the expression for N and V is a different
>> one.
>>
>> It probably works, because the subsi3_carryin_compare instruction sets
>> more CC bits than the pattern does explicitly specify the value.
>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>> hard to write down the correct rtl expression for it.
>>
>> In theory the pattern should describe everything correctly,
>> maybe, like:
>>
>> set (reg:CC_C CC_REGNUM)
>>     (compare:CC_C
>>       (zero_extend:DI (match_dup 4))
>>       (plus:DI (zero_extend:DI (match_dup 5))
>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>> set (reg:CC_NV CC_REGNUM)
>>     (compare:CC_NV
>>      (match_dup 4))
>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>> set (match_dup 3)
>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>
>>
>> But I doubt that will work to set CC_REGNUM with two different modes
>> in parallel?
>>
>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>> defines C from the DImode result, and NV from the SImode result,
>> similar to the CC_NOOVmode, that also leaves something open what
>> bits it really defines?
>>
>>
>> What do you think?
>>
>>
>> Thanks
>> Bernd.
>
> I think maybe the right solution is to invent a new CCmode
> that defines C as if the comparison is done in DImode
> but N and V as if the comparison is done in SImode.
>
> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
> only N and Z are set correctly), but in a different patch of course.
>
> Attached is a new version that implements the new CCmode.
>
> How do you like this new version?
>
> It seems to be able to build a cross-compiler at least.
>
> I will start a new bootstrap with this new patch, but that can take some
> time until I have definitive results.
>
> Is there still a chance that it can go into gcc-7 or should it wait
> for the next stage1?
>
> Thanks
> Bernd.


I thought I should also look at where the subdi_compare1 amd the
negdi2_compare patterns are used, and look if the caller is fine with
not having all CC bits available.

And indeed usubv<mode>4 turns out to be questionabe, because it
emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
CCmode) which is inconsistent when subdi3_compare1 no longer uses
CCmode.

To correct this, the branch should use CC_Cmode which is always defined.

So I tried to test this pattern, with the following test programs,
and found that the code actually improves when the branch uses CC_Cmode
instead of CCmode, both for SImode and DImode data, which was a bit
surprising.

I used this test program to see how the usubv<mode>4 pattern works:

cat test.c (DImode)
unsigned long long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}


unpatched code used 8 byte more stack than patched,
because the DImode subtraction is effectively done twice.

cat test1.c (SImode)
unsigned long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}

which generates (unpatched):
         cmp     r3, r0
         sub     ip, r3, r0

instead of expected (patched):
	subs	r3, r3, r2


The condition is extracted by ifconversion and/or combine
and complicates the resulting code instead of simplifying.

I think this happens only when the branch and the subsi/di3_compare1
is using the same CC mode.

That does not happen when the CC modes disagree, as with the
proposed patch.  All other uses of the pattern are already using
CC_Cmode or CC_Vmode in the branch, and these do not change.

Attached is an updated version of the patch, that happens to
improve the code generation of the usubsi4 and usubdi4 pattern,
as a side effect.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.
2016-01-13  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* config/arm/arm-modes.def (CC_NCV_CIC): New mode.
	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
	subsi3_carryin_compare, subsi3_carryin_compare_const,
	negdi2_compare, *negsi2_carryin_compare,
	*arm_cmpdi_insn): Fix the CC reg dataflow.
	(usubv<mode>4): Use CC_Cmode for the branch.

Index: gcc/config/arm/arm-modes.def
===================================================================
--- gcc/config/arm/arm-modes.def	(revision 244439)
+++ gcc/config/arm/arm-modes.def	(working copy)
@@ -38,6 +38,8 @@
    (used for DImode unsigned comparisons).
    CC_NCVmode should be used if only the N, C, and V flags are correct
    (used for DImode signed comparisons).
+   CC_NCV_CICmode defines N and V in SImode and C in DImode
+   (used for carryin_compare patterns).
    CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -44,6 +46,7 @@
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NCV_CIC);
 CC_MODE (CC_SWP);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 244439)
+++ gcc/config/arm/arm.md	(working copy)
@@ -669,17 +669,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -713,13 +711,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -748,17 +746,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -777,17 +773,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1080,14 +1075,14 @@
   "TARGET_32BIT"
 {
   emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
 
   DONE;
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1098,10 +1093,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1157,13 +1156,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1171,13 +1172,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4634,8 +4637,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4647,8 +4650,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4707,12 +4714,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7361,12 +7370,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]