This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation


On Mon, Apr 14, 2014 at 6:13 PM, Terry Guo <flameroc@gmail.com> wrote:
> On Thu, Apr 3, 2014 at 10:11 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>> I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
>>> (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero
>>> during the fwprop1 pass, considering the fact that the high 32-bit part of
>>> (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications
>>> for high 32-bit part of the result of AND operation. The attached patch is
>>> trying to improve simplify_rtx to handle such case. Other target like x86
>>> seems hasn't such issue because it generates different RTX to handle 64bit
>>> multiplication on a 32bit machine.
>>
>> See http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00073.html for another try,
>> which led to the simplification in combine.c:combine_simplify_rtx line 5448.
>>
>> Your variant is both more general, because it isn't restricted to the lowpart,
>> and less general, because it is artificially restricted to AND.
>>
>> Some remarks:
>>  - this needs to be restricted to non-paradoxical subregs,
>>  - you need to test HWI_COMPUTABLE_MODE_P (innermode),
>>  - you need to test !side_effects_p (op).
>>
>> I think we need to find a common ground between Jakub's patch and yours and
>> put a single transformation in simplify_subreg.
>>
>> --
>> Eric Botcazou
>
> Thanks for your review. Even without Jakub's patch, the combine pass
> can simplify such subreg to zero. But the live range of destination
> register causes combine pass to undo all attempts. Here is detailed
> explanation, please note that the AND operation (and:DI (reg/v:DI 115
> [ a ]) (const_int 4294967295 [0xffffffff])) from fwprop1 pass is
> turned into ZERO_EXTEND operation (zero_extend:DI (reg:SI 0 r0 [ a ]))
> in combine pass. The variable a is function arguments and occupies
> register r0 and r1. Right before try_combine function we have below
> three instructions:
>
> (insn 8 4 10 2 (set (reg:DI 118 [ D.4091 ])
>         (zero_extend:DI (reg:SI 0 r0 [ a ]))) 000.c:4 170 {zero_extendsidi2}
>      (expr_list:REG_DEAD (reg:SI 0 r0)
>         (nil)))
>
> (insn 10 8 13 2 (set (reg:SI 121)
>         (mult:SI (reg/v:SI 116 [ b ])
>             (subreg:SI (reg:DI 118 [ D.4091 ]) 4))) 000.c:4 41 {*arm_mulsi3_v6}
>      (nil))
>
> (insn 13 10 14 2 (set (reg:DI 117 [ D.4091 ])
>         (mult:DI (zero_extend:DI (subreg:SI (reg:DI 118 [ D.4091 ]) 0))
>             (zero_extend:DI (reg/v:SI 116 [ b ])))) 000.c:4 60 {*umulsidi3_v6}
>      (expr_list:REG_DEAD (reg:DI 118 [ D.4091 ])
>         (expr_list:REG_DEAD (reg/v:SI 116 [ b ])
>             (nil))))
>
> Without any changes, current gcc can simplify the insn 10 to
>
> (insn 10 8 13 2 (set (reg:SI 121)
>         (const_int 0 [0])) 000.c:4 41 {*arm_mulsi3_v6}
>      (nil))
>
> This is what we wanted. Unfortunately the live range of register
> (reg:DI 118) is extended to insn 13. Thus unable to remove insn 8. The
> combine pass has to generate PARALLEL rtx to handle such case:
>
> (parallel [
>         (set (reg:SI 121)
>             (const_int 0 [0]))
>         (set (reg:DI 118 [ D.4091 ])
>             (zero_extend:DI (reg:SI 0 r0 [ a ])))
>     ])
>
> There is no instruction patter to match this parallel rtx. This causes
> combine pass to undo all attempts and roll back simplification to
> subreg operand. Without such live range extension, everything works
> fine. That's why such optimization can only happen in fwprop1 pass.
>
> I made a typo in my previous changelog. My code are indeed for
> simplify_subreg function. I updated my patch per your suggestions to
> handle more general cases. Please review again. Thanks.
>
> BR,
> Terry

Hi Eric,

How do you think my above investigation? To put it simple, the combine
can optimize it to zero, but lately has to undo it because some
restrictions. Now I generalize my optimization to focus on things like
(subreg:mode (OP) index), then use function nonzero_bits to get bits
status of the result of OP. Once the bits expected by subreg are all
zero, we can optimize this whole subreg to zero. Does this make sense?
Please advise. Thanks.

BR,
Terry


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]