This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RS6000] Fix PR61098, Poor code setting count register


Alan,

Danny may have re-organized the code, but I thought that it originally
came from Tom Rixx, if not earlier.

I seem to remember problems in the past with late creation of TOC
entries for constants causing problems, so it was easier to fall back
to materializing all integer constants inline. I don't remember the
PRs, but I think there were issues with creating a TOC if the late
constant were the only TOC reference, or maybe the issue was buying a
stack frame to materialize the TOC/GOT for a late constant.  And
maximum 5 instruction sequence is not really bad relative to a load
from the TOC (even with medium model / data in TOC). There are a lot
of trade-offs with respect to I$ expansion versus the load hitting in
the L1 D$.

Alpha emit_set_const does limit the number of instructions, but that
is a search for a more efficient sequence than the naive sequence. The
Alpha splitters use alpha_split_const_mov(), which tries
alpha_emit_set_const() for an efficient sequence and then falls back
to alpha_emit_set_long_const() for a naive sequence.  Alpha uses PLUS
instead of IOR because of the way its ISA works.
alpha_emit_set_long_const() always will materialize the constant and
does not check for a maximum number of instructions. This is why it's
comment says "fall back to straight forward decomposition".

However, alpha_emit_set_long_const() and alpha_split_const_mov() can
fail, presumably because emit_move_insn() fails, not because of
reaching a maximum number of instructions.

alpha_legitimate_constant_p() rejecs expensive constants early. Once
the splitter is invoked, it always tries to materialize the constant,
but the splitter apparently can fail for other reasons.

I don't mind exploring the benefits of tighening up
rs6000_legitimate_const(), but I'm not sure it's an obvious win,
especially with the potential failure corner cases.

However, I want to have a better understanding about the part of the
patch that removes the FAIL path from the splitters.

Thanks, David



On Tue, May 13, 2014 at 11:04 PM, Alan Modra <amodra@gmail.com> wrote:
> On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
>> On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote:
>>
>> >> Please do not remove all of the comments from the two functions. The
>> >> comments should provide some documentation about the different
>> >> purposes of the two functions other than setting DEST to a CONST.
>> >
>> > I believe my updated comment covers the complete purpose of the
>> > function nowadays.  The comments I removed are out-dated, and should
>> > have been removed a long time ago..  rs6000_emit_set_const does not
>> > even look at N, it always returns a non-zero result, and the return is
>> > only tested for non-zero.  I removed MODE too, because that is always
>> > the same as GET_MODE (dest).
>>
>> It is helpful if the comment expresses more than restating the
>> information one can glean from the function name. It's useful to note
>> that rs6000_emit_set_long_const is a standard decomposition with a
>> bounded number of instructions.
>>
>> >> I think that the way you rearranged the invocations of copy_rtx() in
>> >> rs6000_emit_set_long_const() is okay, but it would be good for someone
>> >> else to double check.
>> >
>> > Yeah, that function is a bit messy.  I took the approach of always use
>> > a bare "dest" once in the last instruction emitted, with every other
>> > use getting hit with copy_rtx.  The previous approach was similar,
>> > but used the bare "dest" on the first instruction emitted.  Obviously
>> > you don't need copy_rtx anywhere with the new code when
>> > can_create_pseudo_p is true, but I felt it wasn't worth optimising
>> > that for the added source complication.
>>
>> Can you help clarify the removal of the code that tests if the
>> splitter failed?  The splitters in the Alpha port follow mostly the
>> same rhythm, with a little bit of further cleanup and consolidation
>> relative to the rs6000 port. alpha_split_const_mov() falls back on
>> alpha_emit_set_long_const(), but checks that the target is valid and
>> allows the splitter to fail. Either the Alpha port is doing
>> unnecessary work or this cleanup patch is too aggressive. Either way,
>> a comment seems necessary.
>
> OK, I've had a good look at the history of this code.
>
> rs6000_emit_set_const and rs6000_emit_set_long_const were introduced
> with revision 44516, a largish patch by Dan Berlin.  As you hint
> above, it seems the functions were copied from alpha.  So the
> parameters were unnecessary and the comments just plain wrong for the
> rs6000 version of code right from the initial commit.  Worse, only
> half of necessary infrastructure was copied from alpha..
>
> So let me lay out what I believe should be happening with
> (set (reg) (constant))
>
> At expand time, if the above set can't be implemented in a single
> instruction, then it should be decomposed to the equivalent set high
> part, ori low part, and possibly shift instructions so long as the
> resulting sequence is small.  I think we basically do this correctly
> in rs6000_emit_move.  See the num_insns_constant call there.
> Constants that can't be evaluated inline by two (or three)
> instructions will be replaced with a load from the TOC.
>
> The same thing ought to happen in the splitters that use
> rs6000_emit_set_const.  rs6000_emit_set_const should refuse to expand
> to too many instructions (just like alpha).  We don't do this, but if
> we did, this would leave some (set (reg) (constant)) instructions in
> the RTL.  Alternatively the splitters could generate loads from the
> TOC, but see pr57836, which shows the loads from the TOC we crafted
> at expand time being reduced back to (set (reg) (constant)).
>
> Finally, at reload time, any remaining (set (reg) (constant))
> (ie. those that result in a long inline sequence) should be forced to
> the TOC.  This is the missing part of the infrastructure that wasn't
> copied from alpha.  Our legitimate_constant_p needs to reject some
> constants..  As it is, reload simply expands to a four or five
> inline instruction sequence.
>
> David, I'd like some help with the legitimate_constant_p
> implementation.  I have something that seems to work (not yet
> regression tested) but there are a number of things that I'm not clear
> on (eg. the revision 20229 change) so likely will get it wrong.
>
> --
> Alan Modra
> Australia Development Lab, IBM


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]