[PATCH] Check if loading const from mem is faster

Jiufu Guo guojiufu@linux.ibm.com
Thu Mar 3 10:08:27 GMT 2022


Jeff Law <jeffreyalaw@gmail.com> writes:

> On 3/1/2022 12:47 AM, Richard Biener via Gcc-patches wrote:
>> On Tue, 1 Mar 2022, Jiufu Guo wrote:
>>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>>> On Thu, Feb 24, 2022 at 09:50:28AM +0100, Richard Biener wrote:
>>>>> On Thu, 24 Feb 2022, Jiufu Guo wrote:
>>>>>> And another thing as Segher pointed out, CSE is doing too
>>>>>> much work.  It may be ok to separate the constant handling
>>>>>> logic from CSE.
>>>>> Not sure - CSE just is value numbering, I don't see that it does
>>>>> more than that.  Yes, it might have developed "heuristics" over
>>>>> the years what to CSE and to what and where to substitute and
>>>>> where not.  But in the end it does just value numbering.
>>>> It also does various micro-optimisations, like all the CC things it
>>>> does.
>>>> It is not very good at doing the CSE job, but it cannot easily be
>>>> replaced by a better implementation because it does many other small
>>>> optimisations (that are not done elsewhere).
>>> Thanks a lot for these comments! I'm also wondering if we would
>>> rewrite this cse.cc or refactor it in some aspects.
>> I think time is better spent elsewhere ... I don't think CSE is as
>> bad as Segher depicts it - it might do "CC things" and other bits
>> but in the end that's going to be instruction/expression combination
>> things that "fit" likely because a value lattice (or just nonzero bits
>> in the cselib variant) existed.
>> So what might be interesting would be to work towards cleansing
>> CSE of those, producing testcases and making sure a better fit
>> pass (combine? fwprop? compare-elim?) performs the desired
>> optimization.
>> But I'm not really sure what Segher is talking about - I suppose
>> it must be magic done inside cselib (which only does analysis),
>> not in cse.cc itself.
> I also don't see cse.cc has being *that* bad.     It's largely the
> same bits as we had before SSA and I wouldn't be surprised if there
> are things in there that aren't needed anymore.
> For example, while we have excised HAVE_cc0, I think there are still
> remnants of those concepts lying around (see this_insn_cc0 and
> friends).
> I'd always hoped we'd get to a point where we could eliminate the
> follow-jumps and cse-around-loops bits, but we never managed to
> accomplish that.  When I last looked (a long long time ago), there
> were still things that got exposed when we lowered from gimple to RTL
> that were picked up by those options.   Changing to work  with
> dominators to find EBBs would be a nice cleanup, but the code as it
> stands right now works.
> I wouldn't be surprised if the costing stuff could use some serious
> cleanup.  But I don't see it as inherently broken, just very dated.
> But in the end, it's really just value-numbering as you say.  It could
> be rewritten to be more modern, but I doubt it's going to make much of
> a real difference in the end.  If there are things that fit logically
> elsehwere, sure move them where they more logically fit, but I doubt
> there's a lot of this stuff.
> jeff

Thanks for your comments and suggestions!

I had a quick test about cse1/cse2 on spec2017. Compare with "-O3",
we can see both positive and negative impacts for "-O3
-fdisable-rtl-cse1 -fdisable-rtl-cse2".  we can see performance gain on
500.perlbench_r(+1.83%),  538.imagick_r(0.7%) 520.omnetpp_r(+0.6%);
and performance recession on  548.exchange2_r(-1.97%) 557.xz_r(-0.7%)
on Power9.  The performance change on 500.perlbench_r would relate
to the behavior of 'constant loading'.

The performance impactions may be directly or indirectly caused by
sub-behaviors of current cse.cc.  And the data would change on
on different targets. Anyway, this data may also indicate that
we could clean up or enhance some functionalities in cse.cc.


More information about the Gcc-patches mailing list