This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PING] [PATCH, ARM] correctly encode the CC reg data flow
- From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, Wilco Dijkstra <wilco dot dijkstra at arm dot com>
- Date: Sun, 17 Sep 2017 08:38:13 +0000
- Subject: [PING] [PATCH, ARM] correctly encode the CC reg data flow
- Authentication-results: sourceware.org; auth=none
- Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de;
- References: <AM4PR0701MB2162CC520827145E96A75FE2E49E0@AM4PR0701MB2162.eurprd07.prod.outlook.com> <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> <HE1PR0701MB21691E50EE349B95B4ADC4DBE4780@HE1PR0701MB2169.eurprd07.prod.outlook.com> <AM4PR0701MB216294C83C28945778C84345E4780@AM4PR0701MB2162.eurprd07.prod.outlook.com> <AM4PR0701MB2162B85CF846EC1E42D44F43E47F0@AM4PR0701MB2162.eurprd07.prod.outlook.com> <59AD5B44.2080509@foss.arm.com> <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de> <AM5PR0701MB26572C99CDF0198F38A7DB9BE4970@AM5PR0701MB2657.eurprd07.prod.outlook.com> <6f24b217-9131-6aef-0cdb-f9f26a538fe9@arm.com> <AM5PR0701MB26579A51C8FEC1232A3F607EE4970@AM5PR0701MB2657.eurprd07.prod.outlook.com> <59B014E3.2000900@foss.arm.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Hi,
ping... for the documentation part.
Thanks,
Bernd.
On 09/06/17 17:31, Kyrill Tkachov wrote:
>
> On 06/09/17 14:17, Bernd Edlinger wrote:
>> On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
>>> On 06/09/17 13:44, Bernd Edlinger wrote:
>>>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>>>> Hi Kyrill,
>>>>>
>>>>> Thanks for your review!
>>>>>
>>>>>
>>>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>>>> Hi Bernd,
>>>>>>
>>>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> this is related to PR77308, the follow-up patch will depend
>>>>>>>>>>> on this
>>>>>>>>>>> one.
>>>>>>>>>>>
>>>>>>>>>>> When trying the split the *arm_cmpdi_insn and
>>>>>>>>>>> *arm_cmpdi_unsigned
>>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>>
>>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>>> up into this:
>>>>>>>>>>>
>>>>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>>> (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>>> (set (match_dup 2)
>>>>>>>>>>> (minus:SI (match_dup 5)
>>>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>>>> (const_int
>>>>>>>>>>> 0))))])]
>>>>>>>>>>>
>>>>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>>>>> (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>>> (set (reg:CC CC_REGNUM)
>>>>>>>>>>> (compare:CC (match_dup 0) (match_dup
>>>>>>>>>>> 1))))]
>>>>>>>>>>>
>>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the
>>>>>>>>>>> reg:CC
>>>>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>>> identical.
>>>>>>>>>>>
>>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>>
>>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>>
>>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>
>>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>>> with
>>>>>>>>>> the patch itself.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>>> [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>>> (compare:CC_NCV
>>>>>>>>>> (match_operand:DI 1 "register_operand" "r")
>>>>>>>>>> (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>>> "TARGET_32BIT"
>>>>>>>>>> "#"
>>>>>>>>>> "&& reload_completed"
>>>>>>>>>> [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1)
>>>>>>>>>> (match_dup
>>>>>>>>>> 2)))])
>>>>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>>> (compare:CC_C
>>>>>>>>>> (zero_extend:DI (match_dup 4))
>>>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int
>>>>>>>>>> 0)))))
>>>>>>>>>> (set (match_dup 3)
>>>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>>> split
>>>>>>>>>> the overall result for the condition register is in mode
>>>>>>>>>> CC_NCV, but
>>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>>
>>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>>>> implies that
>>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>>> should in
>>>>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm
>>>>>>>>>> inclined to
>>>>>>>>>> agree
>>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>>
>>>>>>>>>> I'm not sure if there are other consequences that will fall
>>>>>>>>>> out from
>>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>>> select_cc_mode
>>>>>>>>>> as well).
>>>>>>>>>>
>>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>>
>>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI
>>>>>>>>> ...)
>>>>>>>>> only gets the C bit correct, the expression for N and V is a
>>>>>>>>> different
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> It probably works, because the subsi3_carryin_compare
>>>>>>>>> instruction sets
>>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>>> it is
>>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>>
>>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>>> maybe, like:
>>>>>>>>>
>>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>>> (compare:CC_C
>>>>>>>>> (zero_extend:DI (match_dup 4))
>>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>>> (compare:CC_NV
>>>>>>>>> (match_dup 4))
>>>>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>> (const_int 0)))
>>>>>>>>> set (match_dup 3)
>>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But I doubt that will work to set CC_REGNUM with two different
>>>>>>>>> modes
>>>>>>>>> in parallel?
>>>>>>>>>
>>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that
>>>>>>>>> implicitly
>>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>>> bits it really defines?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Bernd.
>>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>>
>>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC =
>>>>>>>> Carry-In-Compare),
>>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>>> only N and Z are set correctly), but in a different patch of
>>>>>>>> course.
>>>>>>>>
>>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>>
>>>>>>>> How do you like this new version?
>>>>>>>>
>>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>>
>>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>>> some
>>>>>>>> time until I have definitive results.
>>>>>>>>
>>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>>> for the next stage1?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>>> negdi2_compare patterns are used, and look if the caller is fine
>>>>>>> with
>>>>>>> not having all CC bits available.
>>>>>>>
>>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch
>>>>>>> (LTU,
>>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>>> CCmode.
>>>>>>>
>>>>>>> To correct this, the branch should use CC_Cmode which is always
>>>>>>> defined.
>>>>>>>
>>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>>> and found that the code actually improves when the branch uses
>>>>>>> CC_Cmode
>>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>>> surprising.
>>>>>>>
>>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>>
>>>>>>> cat test.c (DImode)
>>>>>>> unsigned long long x, y, z;
>>>>>>> int b;
>>>>>>> void test()
>>>>>>> {
>>>>>>> b = __builtin_sub_overflow (y,z, &x);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>>
>>>>>>> cat test1.c (SImode)
>>>>>>> unsigned long x, y, z;
>>>>>>> int b;
>>>>>>> void test()
>>>>>>> {
>>>>>>> b = __builtin_sub_overflow (y,z, &x);
>>>>>>> }
>>>>>>>
>>>>>>> which generates (unpatched):
>>>>>>> cmp r3, r0
>>>>>>> sub ip, r3, r0
>>>>>>>
>>>>>>> instead of expected (patched):
>>>>>>> subs r3, r3, r2
>>>>>>>
>>>>>>>
>>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>>
>>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>>> is using the same CC mode.
>>>>>>>
>>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>>> proposed patch. All other uses of the pattern are already using
>>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>>
>>>>>>> Attached is an updated version of the patch, that happens to
>>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>>> as a side effect.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> Is it OK for trunk?
>>>>>> I'm very sorry it has taken so long to review.
>>>>>> I've been ramping up on the context recently now so I'll try to move
>>>>>> this along...
>>>>>>
>>>>>> This patch looks mostly ok to me from reading the patterns and the
>>>>>> discussion around it.
>>>>>> I have one concern:
>>>>>>
>>>>>>
>>>>>> (define_insn_and_split "negdi2_compare"
>>>>>> - [(set (reg:CC CC_REGNUM)
>>>>>> - (compare:CC
>>>>>> + [(set (reg:CC_NCV CC_REGNUM)
>>>>>> + (compare:CC_NCV
>>>>>> (const_int 0)
>>>>>> (match_operand:DI 1 "register_operand" "0,r")))
>>>>>> (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>>>> @@ -4647,8 +4650,12 @@
>>>>>> (compare:CC (const_int 0) (match_dup 1)))
>>>>>> (set (match_dup 0) (minus:SI (const_int 0)
>>>>>> (match_dup 1)))])
>>>>>> - (parallel [(set (reg:CC CC_REGNUM)
>>>>>> - (compare:CC (const_int 0) (match_dup 3)))
>>>>>> + (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>>>> + (compare:CC_NCV_CIC
>>>>>> + (const_int 0)
>>>>>> + (plus:DI
>>>>>> + (zero_extend:DI (match_dup 3))
>>>>>> + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>> (set (match_dup 2)
>>>>>> (minus:SI
>>>>>> (minus:SI (const_int 0) (match_dup 3))
>>>>>>
>>>>>>
>>>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>>>> being a const_int 0 and the second being
>>>>>> a complex expression as the RTL canonicalization rules usually
>>>>>> require
>>>>>> the complex operand going first if possible.
>>>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>>>> "If one of the operands is a constant, it should be placed in the
>>>>>> second operand and the comparison code adjusted as appropriate."
>>>>>> So it seems that the pre-existing pattern that puts const_int 0 as
>>>>>> the
>>>>>> first operand already breaks that rule.
>>>>>> I think we should fix that and update the use of condition code to a
>>>>>> GEU rather than LTU as well.
>>>>>>
>>>> Well, the sentence before that one is even more explicit:
>>>>
>>>> "Normally, @var{x} and @var{y} must have the same mode. Otherwise,
>>>> @code{compare} is valid only if the mode of @var{x} is in class
>>>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>>>> @code{const_double} with mode @code{VOIDmode}."
>>>>
>>>> So because the const_int 0 has VOIDmode the comparison is done
>>>> in y-mode not x-mode.
>>>>
>>>> But unfortunately I see no way how to accomplish this,
>>>> because this assumes that the compare can be easily swapped
>>>> if the conditional instruction just uses one of GT/GE/LE/LT
>>>> or GTU/GEU/LEU/LTU. But that is only the case for plain CCmode.
>>>>
>>>> And in this example we ask for "overflow", but while 0-X can
>>>> overflow X-0 simply can't. And moreover there are non-symmetric
>>>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>>>> the swapped conditions GT/LE/GTU/LEU.
>>>>
>>>> I think the only solution would be to adjust the spec to
>>>> reflect the implementation:
>>>>
>>>> Index: rtl.texi
>>>> ===================================================================
>>>> --- rtl.texi (revision 251752)
>>>> +++ rtl.texi (working copy)
>>>> @@ -2252,6 +2252,13 @@
>>>> If one of the operands is a constant, it should be placed in the
>>>> second operand and the comparison code adjusted as appropriate.
>>>>
>>>> +There may be exceptions from this rule if the mode @var{m} carries
>>>> +not enough information for the swapped comparison operator, or
>>> There may be exceptions _to_ ... if mode @var{m} does not carry
>>> enough...
>>>
>>>> +if we ask for overflow from the subtraction.
>>> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
>>>
>>>> That means, while
>>>> +0-X may overfow X-0 can never overflow. Under these conditions
>>>> +a compare may have the constant expression at the left side.
>>> In these circumstances the constant will be in the first operand .
>>>
>>> (left and right don't really make sense for RTL).
>>>> +Examples are the ARM negdi2_compare pattern and similar.
>>>> +
>>>> A @code{compare} specifying two @code{VOIDmode} constants is not
>>>> valid
>>>> since there is no way to know in what mode the comparison is to be
>>>> performed; the comparison must either be folded during the
>>>> compilation
>>>>
>>>>
>>>>
>>>> Please advise.
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>> Hmmm...
>>>>>
>>>>> I think the compare is not a commutative operation, and swapping
>>>>> the arguments will imply a different value in the flags.
>>>>>
>>>>> So if I write
>>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>> (compare:CC_NCV
>>>>> (const_int 0)
>>>>> (reg:DI 123)))
>>>>>
>>>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or
>>>>> GEU,
>>>>> N,V = usable for LT, GE
>>>>>
>>>>> But if I write
>>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>> (compare:CC_NCV
>>>>> (reg:DI 123)
>>>>> (const_int 0)))
>>>>>
>>>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>>>> the same and the actual value in the flags is defined by the
>>>>> expansion.
>>>>> Of course there exists probably no matching expansion for that.
>>>>>
>>>>> Note that both LTU in the above hunk are in a parallel-stmt and
>>>>> operate
>>>>> on the flags from the previous pattern, so changing these to GEU
>>>>> will probably be wrong.
>>>>>
>>>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>>>> (const_int 0) (match_dup 1)).
>>>>>
>>>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>>>> have:
>>>>>
>>>>> emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>>> arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>>>
>>>>> I think only 0-x can overflow while x-0 can never overflow.
>>>>>
>>>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>>>> a lot and I guess if this pattern is created by a splitter, this can
>>>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>>>> no way how combine could mess with this pattern due to the exotic
>>>>> CCmode. So while I think it would work to swap only the notation of
>>>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and
>>>>> the
>>>>> consuming statements, that would make it quite hard to follow for the
>>>>> human reader at least.
>>>>>
>>>>> What do you think?
>
> I agree on the readability point. It's already very hard to follow the
> CCmode stuff as it is :(
> My concern was on whether some RTL pass (combine in particular) would
> trip up on this
> nominally non-canonical RTL. As long as nothing manipulates these RTXes
> and they match
> the precise single pattern they're expected to match we should be ok I
> think. One wonders
> whether we would be better off using unspecs if the compare operator is
> not exactly what we need,
> but I think we can get away with COMPARE for now.
>
>
>>>>>
>>>>> Bernd.
>> Attached is the patch with an update to the rtl.texi documentation.
>> The code does not change, so I did no new bootstrap.
>>
>>
>> Is it OK for trunk?
>
> This looks ok to me, but I'd feel more comfortable if someone more
> knowledgeable on RTL could
> ok the documentation part (I guess it's outside my remit anyway) as the
> overflow case you pointed out
> does indeed make the compare operator not commutative.
>
> So the arm parts are ok assuming the documentation change is ok'd.
> Richard, would you be okay with the suggested documentation change?
>
> Thanks,
> Kyrill
>
>>
>> Thanks
>> Bernd.
>