This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: extend fwprop optimization

From: Richard Biener <richard dot guenther at gmail dot com>
To: Wei Mi <wmi at google dot com>
Cc: Steven Bosscher <stevenb dot gcc at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, David Li <davidxl at google dot com>, Uros Bizjak <ubizjak at gmail dot com>
Date: Mon, 25 Mar 2013 10:35:27 +0100
Subject: Re: extend fwprop optimization
References: <CA+4CFy4N5+g0XVJ4vDtGyqHOgj15t3vDiiF0A8_1H9hKNZ+LuQ at mail dot gmail dot com> <CABu31nMvJnvWoj-xA-YkOeH5XLgj+8OLjALyxE+pMaA=s91_KA at mail dot gmail dot com> <CA+4CFy6q-HXCJ3jPQwGzV_beysTdEV-GVsY=0oFLbL-iDVaSbQ at mail dot gmail dot com> <CABu31nOAuZ16eQS+8fq=bntXXfzfzHT7aCZh_vnJG4xh45Yccg at mail dot gmail dot com> <CA+4CFy5++86xq8q7_4x-YuOCHSrpsZ4QST1J6KFK_X_EjxRh+A at mail dot gmail dot com> <CABu31nPB3t+b61fSndgbFD-Ps4SXo62872f2cOyRBOi0nW22ig at mail dot gmail dot com> <CA+4CFy6thk60J=4O626nFKC9t9ivYyg9jPtMHcBWGdM=t49Ubg at mail dot gmail dot com> <CA+4CFy5GkeRqe=9Q8tc688UiFXpe2vFS-QjKUpbuWLCRHV8coA at mail dot gmail dot com> <CABu31nN64RUCC8TBjUEunUMDcu0BAuV7+_JT2aogbJ4uybiRFw at mail dot gmail dot com> <CA+4CFy5fYWHAHWgzL+SNBm3H+LophqSEo3LMwqKgW2FCej2A9Q at mail dot gmail dot com> <CABu31nOtV85auTQa8mGFZ6Lnz81VQPbgoeXGLsu58PB0nLJ9JQ at mail dot gmail dot com> <CA+4CFy5fYFpUK3FmmWoVTrSxBamGhBy=+_dQEa24TSVLY62SiQ at mail dot gmail dot com> <CA+4CFy5qgj_vxOxouauPvwAWA77sWe6wmrN9MW90fOZVhS7KnQ at mail dot gmail dot com>

On Sun, Mar 24, 2013 at 5:18 AM, Wei Mi <wmi@google.com> wrote:
> This is the patch to add the shift truncation in
> simplify_binary_operation_1. I add a new hook
> TARGET_SHIFT_COUNT_TRUNCATED which uses enum rtx_code to decide
> whether we can do shift truncation. I didn't use
> TARGET_SHIFT_TRUNCATION_MASK in simplify_binary_operation_1 because it
> uses the macro SHIFT_COUNT_TRUNCATED. If I change
> SHIFT_COUNT_TRUNCATED to targetm.shift_count_truncated in
> TARGET_SHIFT_TRUNCATION_MASK, I need to give
> TARGET_SHIFT_TRUNCATION_MASK a enum rtx_code param, which wasn't
> trivial to get at many places in existing code.
>
> patch.1 ~ patch.4 pass regression and bootstrap on x86_64-unknown-linux-gnu.

Doing this might prove dangerous in case some pass may later decide
to use an instruction that behaves in different ways.  Consider

   tem = 1<< (n & 255);   // count truncated
   x = y & tem;  // bittest instruction bit nr _not_ truncated

so if tem is expanded to use a shift instruction which truncates the shift
count the explicit and is dropped.  If later combine comes around and
combines the bit-test to use the bittest instruction which does not
implicitely truncate the cound you have generated wrong-code.

So we need to make sure any explicit truncation originally in place
is kept in the RTL - which means SHIFT_COUNT_TRUNCATED should
not exist at all, but instead there would be two patterns for shifts
with implicit truncation - one involving the truncation (canonicalized to
bitwise and) and one not involving the truncation.

Richard.

> Thanks,
> Wei.
>
> On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi <wmi@google.com> wrote:
>> Hi,
>>
>> On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote:
>>>> For the motivational case, I need insn splitting to get the cost
>>>> right. insn splitting is not very intrusive. All I need is to call
>>>> split_insns func.
>>>
>>> It may not look very intrusive, but there's a lot happening in the
>>> back ground. You're creating a lot of new RTL, and then just throw it
>>> away again. You fake the compiler into thinking you're much deeper in
>>> the pipeline than you really are. You're assuming there are no
>>> side-effects other than that some insn gets split, but there are back
>>> ends where splitters may have side-effects.
>>
>> Ok, then I will remove the split_insns call.
>>
>>>
>>> Even though I've asked twice now, you still have not explained this
>>> motivational case, except to say that there is one. *What* are you
>>> trying to do, *what* is not happening without the splits, and *what*
>>> happens if you split. Only if you explain that in a lot more detail
>>> than "I have a motivational case" then we can look into what is a
>>> proper solution.
>>
>> :-). Sorry, I didn't say it clearly. The motivational case is the one
>> mentioned in the following posts (split_insns changes a << (b & 63) to
>> a << b).
>> http://gcc.gnu.org/ml/gcc/2013-01/msg00181.html
>> http://gcc.gnu.org/ml/gcc-patches/2013-02/msg01144.html
>>
>> If I remove the split_insns call and related cost estimation
>> adjustment, the fwprop 18-->22 and 18-->23 will punt, because fwprop
>> here looks like a reverse process of cse, the total cost after fwprop
>> change is increased.
>>
>> Def insn 18:
>>         Use insn 23
>>         Use insn 22
>>
>> If we include the split_insns cost estimation adjustment.
>>   extra benefit by removing def insn 18 = 5
>>   change[0]: benefit = 0, verified - ok  // The cost of insn 22 will
>> not change after fwprop + insn splitting.
>>   change[1]: benefit = 0, verified - ok  // The insn 23 is the same with insn 22
>> Total benefit is 5, fwprop will go on.
>>
>> If we remove the split_insns cost estimation adjustment.
>>   extra benefit by removing def insn 18 = 5
>>   change[0]: benefit = -4, verified - ok   // The costs of insn 22 and
>> insn 23 will increase after fwprop.
>>   change[1]: benefit = -4, verified - ok   // The insn 23 is the same
>> with insn 22
>> Total benefit is -3, fwprop will punt.
>>
>> How about adding the (a << (b&63) ==> a << b) transformation in
>> simplify_binary_operation_1, becuase (a << (b&63) ==> a << b) is a
>> kind of architecture specific expr simplification? Then fwprop could
>> do the propagation as I expect.
>>
>>>
>>> The problem with some of the splitters is that they exist to break up
>>> RTL from 'expand' to initially keep some pattern together to allow the
>>> code transformation passes to handle the pattern as one instruction.
>>> This made sense when RTL was the only intermediate representation and
>>> splitting too early would inhibit some optimizations. But I would
>>> expect most (if not all) such cases to be less relevant because of the
>>> GIMPLE middle-end work. The only splitters you can trigger are the
>>> pre-reload splitters (all the reload_completed conditions obviously
>>> can't trigger if you're splitting from fwprop). Perhaps those
>>> splitters can/should run earlier, or be made obsolete by expanding
>>> directly to the post-splitting insns.
>>>
>>> Unfortunately, it's not possible to tell for your case, because you
>>> haven't explained it yet...
>>>
>>>
>>>> So how about keep split_insns and remove peephole in the cost estimation func?
>>>
>>> I'd strongly oppose this. I do not believe this is necessary, and I
>>> think it's conceptually wrong.
>>>
>>>
>>>>> What happens if you propagate into an insn that uses the same register
>>>>> twice? Will the DU chains still be valid (I don't think that's
>>>>> guaranteed)?
>>>>
>>>> I think the DU chains still be valid. If propagate into the insn that
>>>> uses the same register twice, the two uses will be replaced when the
>>>> first use is seen (propagate_rtx_1 will propagate all the occurrances
>>>> of the same reg in the use insn).  When the second use is seen, the
>>>> df_use and use insn in its insn_info are still available.
>>>> forward_propagate_into will early return after check reg_mentioned_p
>>>> (DF_REF_REG (use), parent) and find out no reg is used  any more.
>>>
>>> With reg_mentioned_p you cannot verify that the DF_REF_LOC of USE is
>>> still valid.
>>
>> I think DF_REF_LOC of USE may be invalid if dangling rtx will be
>> recycled by garbage collection very soon (I don't know when GC will
>> happen). Although DF_REF_LOC of USE maybe invalid, the early return in
>> forward_propagate_into ensure it will not cause any correctness
>> problem.
>>
>>>
>>> In any case, returning to the RD problem for DU/UD chains is probably
>>> a good idea, now that RD is not such a hog anymore. In effect fwprop.c
>>> would return to what it looked like before the patch of r149010.
>>
>> I remove MD problem and use DU/UD instead.
>>
>>>
>>> As a way forward on all of this, I'd suggest the following steps, each
>>> with a separate patch:
>>
>> Thanks for the suggestion!
>>
>>> 1. replace the MD problem with RD again, and build full DU/UD chains.
>>
>> I include patch.1 attached.
>>
>>> 2. post all the recog changes separately, with minimum impact on the
>>> parts of the compiler you don't really change. (For apply_change_group
>>> you could even choose to overload it, or use a NUM argument with a
>>> default value -- not sure if default argument values are OK for GCC
>>> tho'.)
>>
>> patch.2 attached.
>>
>>> 3. implement propagation into multiple USEs, but without the splitting
>>> and peepholing.
>>
>> patch.3 attached.
>>
>>> 4. see about fixing the back end to either split earlier or expand to
>>> the desired patterns directly.
>>
>> I havn't included this part. If you agree with the proposal to add the
>> transformation (a << (b&63) ==> a << b) in
>> simplify_binary_operation_1, I will send out another patch about it.
>>
>> Thanks,
>> Wei.

Follow-Ups:
- Re: extend fwprop optimization
  - From: Wei Mi

References:
- Re: extend fwprop optimization
  - From: Wei Mi
- Re: extend fwprop optimization
  - From: Steven Bosscher
- Re: extend fwprop optimization
  - From: Wei Mi
- Re: extend fwprop optimization
  - From: Steven Bosscher
- Re: extend fwprop optimization
  - From: Wei Mi
- Re: extend fwprop optimization
  - From: Wei Mi

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]