This is the mail archive of the
mailing list for the GCC project.
Re: RTL CSE picking simpler but more expensive expressions (re: PR 65932)
- From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- To: Jeff Law <law at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: Jim Wilson <jim dot wilson at linaro dot org>
- Date: Thu, 14 Jan 2016 10:15:08 +0000
- Subject: Re: RTL CSE picking simpler but more expensive expressions (re: PR 65932)
- Authentication-results: sourceware.org; auth=none
- References: <569635F2 dot 7030406 at foss dot arm dot com> <5696A55D dot 3000409 at redhat dot com>
On 13/01/16 19:28, Jeff Law wrote:
On 01/13/2016 04:33 AM, Kyrill Tkachov wrote:
That callsite was added during gcc-2 development, in an era where we didn't even have public lists where such a change might have been discussed. I didn't try to dig into the old, private gcc2 archives as it's unlikely there's going to
be any rationale there.
I've been able to get it to do the right thing by changing the line
where it initially folds the source of the SET. That is line 4639 in
cse.c: /* Simplify and foldable subexpressions in SRC. Then get the
fully- simplified result, which may not necessarily be valid. */
src_folded = fold_rtx (src, insn);
In this instance SRC is: (plus:SI (mult:SI (sign_extend:SI (subreg:HI
(reg:SI 136) 0)) (sign_extend:SI (subreg:HI (reg:SI 138) 0))) (reg:SI
and the resulting src_folded is: (plus:SI (mult:SI (reg:SI 136)
(reg:SI 138)) (reg:SI 141))
However, fold_rtx also modifies src itself, so after the call to
fold_rtx both src and src_folded contain the plus+mult form without
the extends. So further down in cse_insn where it does the cost
analysis of src, src_folded and other expressions it never considers
the original form of src. Changing that call to fold_rtx to not
modify its argument like so: src_folded = fold_rtx (src, 0); // An
argument of 0 means "make a copy of src before modifying"
fixes the testcase and allows CSE to properly select the cheaper
multiply-extend-add form and doesn't seem to regress codegen in any
way on SPEC2006 for arm. Archeology says this line has been that way
since forever, so does anyone know of the rationale for passing insn
to fold_rtx in that callsite?
ISTM the code clearly expects that SRC and SRC_FOLDED could be different. I think you could make a case that INSN should just be replaced with NULL based on that alone. Verifying across a reasonable body of code that there aren't any
undesirable effects would be wise.
There are cases where calls to fold_rtx with a non-NULL insn don't end up modifying src, ending up in src to not be
always equal to src_folded (I added an assert to that effect and saw it trigger).
It seems that fold_rtx is not *guaranteed* to modify src if insn is non-NULL, but it just does in many cases.
Alternately you could compute stuff for "SRC" prior to the call to fold_rtx. It's less likely to have unexpected side effects.
My inclination would be to go with changing INSN to NULL though. It seems to match the overall intent here better.
I agree and, as I said, I saw no impact on codegen for all of SPEC2006 on arm. It just improved the testcase
I described above. I'll evaluate this patch on aarch64 and x86_64 and hopefully it has a low impact there as well.