This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)
- From: Joern RENNECKE <joern dot rennecke at st dot com>
- To: Bernd Schmidt <bernds_cb1 at t-online dot de>
- Cc: Steven Bosscher <stevenb at suse dot de>, Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org, jh at suse dot cz
- Date: Wed, 30 Nov 2005 21:18:44 +0000
- Subject: Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)
- References: <41E59432.7080504@st.com> <42CD71E0.3070804@st.com> <42D3B666.6050701@st.com> <200507121547.21305.stevenb@suse.de> <42D3E705.9030507@st.com> <42D7C264.6030507@st.com> <438C567A.7030104@t-online.de> <438CCE0F.6020209@st.com> <438DC224.8020300@t-online.de>
Bernd Schmidt wrote:
Why are we using true_regnum at all here, post reload? All regs
should be hardregs at that point, and subregs of them should have been
eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...
Because the cross-jumping code used reg_renumbered_equal_p, which also
takes reg_renumber into account and looks into subregs.
I remember that we had REG_NOTES after reload that still referred to
pseudos; I don't know if that can still happen now.
I suppose for 4.2 I can replace the renumbering code with gcc_assert
(!reload_completed || regno < FIRST_PSEUDO_REGISTER) .
However, we have to be able to handle pseudos, and subregs of pseudos
and hardregs, for this code to be any use for if-convert.
We might also do a bit of cross-jumping before reload in the future,
e.g. when we find entire blocks that are equivalent.
-: 1269: case SET:
...
It would seem that when called for the toplevel pattern, rvalue is
always -1, so we shouldn't get a SET or CLOBBER with a different
rvalue. Have the code gcc_assert that and lose the dead code.
We can still be called for a PRE_MODIFY / POST_MODIFY.
At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
Has anything changed recently in that department?
Not that I know of. I said 'recently' because of the human tendency to
blur bad memories.
We could also fail to merge memory attributes where the match_operand
is of for the memory address inside.
Do any ports generate insns with different MEM attributes outside
operands? That sounds rather broken.
The mem attributes can be generated by machine-independent code. Often
optimizer code just generates rtl and tries
if it matches; thus, a mem with attributes can easily be matched against
a MEM that is a fixed part of the insn pattern.
These attributes might be irrelevant for the machine description, but
meaningful for the alias analysis; hence the need to merge them.
struct-equiv.c ?
Seems plausible. Maybe functions like insns_match_p can move too,
then a name like "match-insns.c" might be better. The exact name is
uncritical though.
Yes, that's one good thing about subversion. However, the response
times and the average command line lengths are still a lot longer than
for cvs :-(
While x_local and y_local generally appear toghether, local_rvalue
does not. Having an array of a struct of two rtx and one bool also
causes inefficient use of space and awkward indexing. Is there any
reason to make MAX_LOCAL a runtime param? The algorithm is not
designed to scale well with a really large MAX_LOCAL.
ISTR there seems to be a general rule/guideline to make this kind of
thing runtime tunable with a --param. If we can do it easily we
probably should, otherwise let's not bother.
As far as I can tell, 16 registers is plenty for matching structurally
equivalent code, i.e. having more would hardly increase code quality,
and the overhead to search through 16 items is in the ballpark or less
then what we'd spend with any elaborate data structure. --param is best
when there is a hard trade-off between compilation time and run time, or
different target processors or code bases have different optimum points
for a tunable value. Since neither appears to be the case here, I think
it would be unnecessary complexity to make this configurable.
Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?
That would be misleading. STRICT_LOW_PART / ZERO_EXTEND / SIGN_EXTEND
is handled by the caller.
set_dest_addr_equiv_p ?
Still trying to wrap my head around this - the REG handling in
function seems difficult; maybe more comments describing the whys of
what we are doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays
multiple times, along with a bit that records the change (live <->
dead)? We'd save the scan backwards, and we could always restore to a
previous checkpoint, which in turn means we no longer need the rerun
machinery? MAX_LOCAL probably needs to grow, but if we're not going
to loop over these arrays anymore, that's no problem.
I don't think the rerun machinery is expensive. Let's face it, most of
the time, this optimization won't succeed, and we'll be able to tell
that straight ahead because we have matched too little instructions (or
too little instructions for the number of input we have). Thus,
actually having to re-run the scan will be much rarer than running the
first pass.
Also, we can't add willy-nilly new mappings; if there is an existing,
live one for one or both of the registers, we can only acept the new one
if it is identical to or a consistent extension of the old mapping.
One possible way to avoid a search is when we find that the registers
are not in x_local_live / y_local_live, set a flag for that, and if this
flag is set and the registers are both either pseudo registers, or hard
registers with hard_regno_nregs == 1, we know they have not been
encountered before.
I have considered handling multiple lifetimes with varying pairings, or
hard registers with hard_regno_nregs mismatch, but I suspect that's a
lot of extra trouble for little or no gain.
But we're (going to) use it in one new place, ifcvt. Don't know if
this is going to be a problem.
All the current ifcvt passes happen before reg-stack conversion. If we
want another one after reg-stack, we can
set the CLEANUP_POST_REGSTACK bit appropriately.
> + /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++
cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the
10th September 2001
to cfgcleanup.c, later broken out into its own function, and
reformatted a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move
into a separate function - unless the corresponding bits in
insns_match_p aren't actually needed anymore (gcov shows no coverage,
but that may not mean anything)?
There are two remainig calls in outgoing_edges_match. One for
tablejumps, and one for the case that we match none of three specific
cases - which are:
- one outgoing edge for bb1
- two outgoing edges for bb1, and bb1 ends in a conditional jump without
other sets. (EDGE_COUNT (bb1->succs) == 2 && any_condjump_p (BB_END
(bb1)) && onlyjump_p (BB_END (bb1)))
- tablejump
Thus, it is not surprisong that you don't see any coverage for an
average program. I don't think there is a good way to extract the
common bits of insns_match_p and struct_equiv -
there is too much control flow interwoven with them. insns_match_p
could be replaced with or using rtx_equiv_p (aka struct_equiv) and/or
friends, but then it would need a struct equiv_info.
I suppose ideally we'd have try_crossjump_to_edge pass a pointer to its
struct equiv_info into outgoing_edges_match, and if there were any local
registers and/or inputs, take a copy of the struct_equiv for re-runs,
and tell struct_equiv_block_eq not to do a full initialization. This
would allow outgoing_edges_match to perform better for targets where
there is a choise what register to use for the condition in a branch
instruction. I could do some sh64-elf-sim testing, but I'm not sure
that that would give sufficient coverage.