This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PING] [PATCH, ARM] correctly encode the CC reg data flow
- From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- To: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, Wilco Dijkstra <wilco dot dijkstra at arm dot com>
- Date: Thu, 20 Apr 2017 18:12:07 +0000
- Subject: [PING] [PATCH, ARM] correctly encode the CC reg data flow
- Authentication-results: sourceware.org; auth=none
- Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de;
- References: <AM4PR0701MB2162CC520827145E96A75FE2E49E0@AM4PR0701MB2162.eurprd07.prod.outlook.com> <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> <HE1PR0701MB21691E50EE349B95B4ADC4DBE4780@HE1PR0701MB2169.eurprd07.prod.outlook.com> <AM4PR0701MB216294C83C28945778C84345E4780@AM4PR0701MB2162.eurprd07.prod.outlook.com> <AM4PR0701MB2162B85CF846EC1E42D44F43E47F0@AM4PR0701MB2162.eurprd07.prod.outlook.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Ping...
for this patch:
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
On 01/18/17 16:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>> one.
>>>>>
>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>>>> was discovered, see [1] for more details.
>>>>>
>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>> up into this:
>>>>>
>>>>> [(set (reg:CC CC_REGNUM)
>>>>> (compare:CC (match_dup 0) (match_dup 1)))
>>>>> (parallel [(set (reg:CC CC_REGNUM)
>>>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>>> (set (match_dup 2)
>>>>> (minus:SI (match_dup 5)
>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>> 0))))])]
>>>>>
>>>>> [(set (reg:CC CC_REGNUM)
>>>>> (compare:CC (match_dup 2) (match_dup 3)))
>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>> (set (reg:CC CC_REGNUM)
>>>>> (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>
>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>> from before. Therefore the *arm_cmpsi_insn appears to be
>>>>> redundant and thus got removed, because the data values are identical.
>>>>>
>>>>> I think that applies to a number of similar pattern where data
>>>>> flow is happening through the CC reg.
>>>>>
>>>>> So this is a kind of correctness issue, and should be fixed
>>>>> independently from the optimization issue PR77308.
>>>>>
>>>>> Therefore I think the patterns need to specify the true
>>>>> value that will be in the CC reg, in order for cse to
>>>>> know what the instructions are really doing.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>>
>>>>
>>>> I agree you've found a valid problem here, but I have some issues with
>>>> the patch itself.
>>>>
>>>>
>>>> (define_insn_and_split "subdi3_compare1"
>>>> [(set (reg:CC_NCV CC_REGNUM)
>>>> (compare:CC_NCV
>>>> (match_operand:DI 1 "register_operand" "r")
>>>> (match_operand:DI 2 "register_operand" "r")))
>>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>> (minus:DI (match_dup 1) (match_dup 2)))]
>>>> "TARGET_32BIT"
>>>> "#"
>>>> "&& reload_completed"
>>>> [(parallel [(set (reg:CC CC_REGNUM)
>>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>>> (compare:CC_C
>>>> (zero_extend:DI (match_dup 4))
>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>> (set (match_dup 3)
>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>
>>>>
>>>> This pattern is now no-longer self consistent in that before the split
>>>> the overall result for the condition register is in mode CC_NCV, but
>>>> afterwards it is just CC_C.
>>>>
>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>> reflect the result of the 64-bit comparison), but that then implies
>>>> that
>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>> should in
>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to
>>>> agree
>>>> that CC_NCV is the correct mode for this operation
>>>>
>>>> I'm not sure if there are other consequences that will fall out from
>>>> fixing this (it's possible that we might need a change to
>>>> select_cc_mode
>>>> as well).
>>>>
>>>
>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>> only gets the C bit correct, the expression for N and V is a different
>>> one.
>>>
>>> It probably works, because the subsi3_carryin_compare instruction sets
>>> more CC bits than the pattern does explicitly specify the value.
>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>> hard to write down the correct rtl expression for it.
>>>
>>> In theory the pattern should describe everything correctly,
>>> maybe, like:
>>>
>>> set (reg:CC_C CC_REGNUM)
>>> (compare:CC_C
>>> (zero_extend:DI (match_dup 4))
>>> (plus:DI (zero_extend:DI (match_dup 5))
>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>> set (reg:CC_NV CC_REGNUM)
>>> (compare:CC_NV
>>> (match_dup 4))
>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>>> set (match_dup 3)
>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>
>>>
>>> But I doubt that will work to set CC_REGNUM with two different modes
>>> in parallel?
>>>
>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>> defines C from the DImode result, and NV from the SImode result,
>>> similar to the CC_NOOVmode, that also leaves something open what
>>> bits it really defines?
>>>
>>>
>>> What do you think?
>>>
>>>
>>> Thanks
>>> Bernd.
>>
>> I think maybe the right solution is to invent a new CCmode
>> that defines C as if the comparison is done in DImode
>> but N and V as if the comparison is done in SImode.
>>
>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>> only N and Z are set correctly), but in a different patch of course.
>>
>> Attached is a new version that implements the new CCmode.
>>
>> How do you like this new version?
>>
>> It seems to be able to build a cross-compiler at least.
>>
>> I will start a new bootstrap with this new patch, but that can take some
>> time until I have definitive results.
>>
>> Is there still a chance that it can go into gcc-7 or should it wait
>> for the next stage1?
>>
>> Thanks
>> Bernd.
>
>
> I thought I should also look at where the subdi_compare1 amd the
> negdi2_compare patterns are used, and look if the caller is fine with
> not having all CC bits available.
>
> And indeed usubv<mode>4 turns out to be questionabe, because it
> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
> CCmode) which is inconsistent when subdi3_compare1 no longer uses
> CCmode.
>
> To correct this, the branch should use CC_Cmode which is always defined.
>
> So I tried to test this pattern, with the following test programs,
> and found that the code actually improves when the branch uses CC_Cmode
> instead of CCmode, both for SImode and DImode data, which was a bit
> surprising.
>
> I used this test program to see how the usubv<mode>4 pattern works:
>
> cat test.c (DImode)
> unsigned long long x, y, z;
> int b;
> void test()
> {
> b = __builtin_sub_overflow (y,z, &x);
> }
>
>
> unpatched code used 8 byte more stack than patched,
> because the DImode subtraction is effectively done twice.
>
> cat test1.c (SImode)
> unsigned long x, y, z;
> int b;
> void test()
> {
> b = __builtin_sub_overflow (y,z, &x);
> }
>
> which generates (unpatched):
> cmp r3, r0
> sub ip, r3, r0
>
> instead of expected (patched):
> subs r3, r3, r2
>
>
> The condition is extracted by ifconversion and/or combine
> and complicates the resulting code instead of simplifying.
>
> I think this happens only when the branch and the subsi/di3_compare1
> is using the same CC mode.
>
> That does not happen when the CC modes disagree, as with the
> proposed patch. All other uses of the pattern are already using
> CC_Cmode or CC_Vmode in the branch, and these do not change.
>
> Attached is an updated version of the patch, that happens to
> improve the code generation of the usubsi4 and usubdi4 pattern,
> as a side effect.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.