This is the mail archive of the
mailing list for the GCC project.
Re: [RS6000] Fix PR61098, Poor code setting count register
- From: David Edelsohn <dje dot gcc at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Alan Modra <amodra at gmail dot com>
- Date: Tue, 13 May 2014 23:46:20 -0400
- Subject: Re: [RS6000] Fix PR61098, Poor code setting count register
- Authentication-results: sourceware.org; auth=none
- References: <20140508014846 dot GA5162 at bubble dot grove dot modra dot org> <CAGWvnyme6a+QLSxXwzUOxWAHWTokxZCySua_+25hUzaEVYvgPA at mail dot gmail dot com> <20140509024054 dot GE5162 at bubble dot grove dot modra dot org> <CAGWvny=3+XuqiWtxhB4HSXDWHRZMpNkhioTVeEpEs9C4vMWRGg at mail dot gmail dot com> <20140514030448 dot GJ5162 at bubble dot grove dot modra dot org>
Danny may have re-organized the code, but I thought that it originally
came from Tom Rixx, if not earlier.
I seem to remember problems in the past with late creation of TOC
entries for constants causing problems, so it was easier to fall back
to materializing all integer constants inline. I don't remember the
PRs, but I think there were issues with creating a TOC if the late
constant were the only TOC reference, or maybe the issue was buying a
stack frame to materialize the TOC/GOT for a late constant. And
maximum 5 instruction sequence is not really bad relative to a load
from the TOC (even with medium model / data in TOC). There are a lot
of trade-offs with respect to I$ expansion versus the load hitting in
the L1 D$.
Alpha emit_set_const does limit the number of instructions, but that
is a search for a more efficient sequence than the naive sequence. The
Alpha splitters use alpha_split_const_mov(), which tries
alpha_emit_set_const() for an efficient sequence and then falls back
to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS
instead of IOR because of the way its ISA works.
alpha_emit_set_long_const() always will materialize the constant and
does not check for a maximum number of instructions. This is why it's
comment says "fall back to straight forward decomposition".
However, alpha_emit_set_long_const() and alpha_split_const_mov() can
fail, presumably because emit_move_insn() fails, not because of
reaching a maximum number of instructions.
alpha_legitimate_constant_p() rejecs expensive constants early. Once
the splitter is invoked, it always tries to materialize the constant,
but the splitter apparently can fail for other reasons.
I don't mind exploring the benefits of tighening up
rs6000_legitimate_const(), but I'm not sure it's an obvious win,
especially with the potential failure corner cases.
However, I want to have a better understanding about the part of the
patch that removes the FAIL path from the splitters.
On Tue, May 13, 2014 at 11:04 PM, Alan Modra <firstname.lastname@example.org> wrote:
> On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
>> On Thu, May 8, 2014 at 10:40 PM, Alan Modra <email@example.com> wrote:
>> >> Please do not remove all of the comments from the two functions. The
>> >> comments should provide some documentation about the different
>> >> purposes of the two functions other than setting DEST to a CONST.
>> > I believe my updated comment covers the complete purpose of the
>> > function nowadays. The comments I removed are out-dated, and should
>> > have been removed a long time ago.. rs6000_emit_set_const does not
>> > even look at N, it always returns a non-zero result, and the return is
>> > only tested for non-zero. I removed MODE too, because that is always
>> > the same as GET_MODE (dest).
>> It is helpful if the comment expresses more than restating the
>> information one can glean from the function name. It's useful to note
>> that rs6000_emit_set_long_const is a standard decomposition with a
>> bounded number of instructions.
>> >> I think that the way you rearranged the invocations of copy_rtx() in
>> >> rs6000_emit_set_long_const() is okay, but it would be good for someone
>> >> else to double check.
>> > Yeah, that function is a bit messy. I took the approach of always use
>> > a bare "dest" once in the last instruction emitted, with every other
>> > use getting hit with copy_rtx. The previous approach was similar,
>> > but used the bare "dest" on the first instruction emitted. Obviously
>> > you don't need copy_rtx anywhere with the new code when
>> > can_create_pseudo_p is true, but I felt it wasn't worth optimising
>> > that for the added source complication.
>> Can you help clarify the removal of the code that tests if the
>> splitter failed? The splitters in the Alpha port follow mostly the
>> same rhythm, with a little bit of further cleanup and consolidation
>> relative to the rs6000 port. alpha_split_const_mov() falls back on
>> alpha_emit_set_long_const(), but checks that the target is valid and
>> allows the splitter to fail. Either the Alpha port is doing
>> unnecessary work or this cleanup patch is too aggressive. Either way,
>> a comment seems necessary.
> OK, I've had a good look at the history of this code.
> rs6000_emit_set_const and rs6000_emit_set_long_const were introduced
> with revision 44516, a largish patch by Dan Berlin. As you hint
> above, it seems the functions were copied from alpha. So the
> parameters were unnecessary and the comments just plain wrong for the
> rs6000 version of code right from the initial commit. Worse, only
> half of necessary infrastructure was copied from alpha..
> So let me lay out what I believe should be happening with
> (set (reg) (constant))
> At expand time, if the above set can't be implemented in a single
> instruction, then it should be decomposed to the equivalent set high
> part, ori low part, and possibly shift instructions so long as the
> resulting sequence is small. I think we basically do this correctly
> in rs6000_emit_move. See the num_insns_constant call there.
> Constants that can't be evaluated inline by two (or three)
> instructions will be replaced with a load from the TOC.
> The same thing ought to happen in the splitters that use
> rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand
> to too many instructions (just like alpha). We don't do this, but if
> we did, this would leave some (set (reg) (constant)) instructions in
> the RTL. Alternatively the splitters could generate loads from the
> TOC, but see pr57836, which shows the loads from the TOC we crafted
> at expand time being reduced back to (set (reg) (constant)).
> Finally, at reload time, any remaining (set (reg) (constant))
> (ie. those that result in a long inline sequence) should be forced to
> the TOC. This is the missing part of the infrastructure that wasn't
> copied from alpha. Our legitimate_constant_p needs to reject some
> constants.. As it is, reload simply expands to a four or five
> inline instruction sequence.
> David, I'd like some help with the legitimate_constant_p
> implementation. I have something that seems to work (not yet
> regression tested) but there are a number of things that I'm not clear
> on (eg. the revision 20229 change) so likely will get it wrong.
> Alan Modra
> Australia Development Lab, IBM