This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)
- From: Bernd Schmidt <bernds_cb1 at t-online dot de>
- To: Joern RENNECKE <joern dot rennecke at st dot com>
- Cc: Steven Bosscher <stevenb at suse dot de>, Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org, jh at suse dot cz
- Date: Wed, 30 Nov 2005 16:15:48 +0100
- Subject: Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)
- References: <41E59432.7080504@st.com> <42CD71E0.3070804@st.com> <42D3B666.6050701@st.com> <200507121547.21305.stevenb@suse.de> <42D3E705.9030507@st.com> <42D7C264.6030507@st.com> <438C567A.7030104@t-online.de> <438CCE0F.6020209@st.com>
Joern RENNECKE wrote:
Bernd Schmidt wrote:
Can we postpone the ifcvt changes? They look quite straightforward,
but it seems like they are independent of the rest. Once we've done
the heavy lifting to get the first part in, we can revisit this.
That's a bit backwards with respect to the motivation of this patch, as
the starting point was the if-conversion enhancement, but I am willing
to split it up like this if you are prepared to review both parts.
I intend to; but regardless of that, splitting things up into smaller,
logically independent parts will always help get things into the tree
more quickly.
Couple of new comments:
Why are we using true_regnum at all here, post reload? All regs should
be hardregs at that point, and subregs of them should have been
eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...
-: 1230: /* Make sure we enter this pair in its
renumbered form. */
259008: 1231: if (x_regno != REGNO (x))
#####: 1232: x = gen_rtx_REG (GET_MODE (x), x_regno);
259008: 1233: if (y_regno != REGNO (y))
#####: 1234: y = gen_rtx_REG (GET_MODE (y), y_regno);
... this bit appears to be dead code too. Speaking of which,
-: 1269: case SET:
-: 1270: {
-: 1271: /* Process destination. */
4647110: 1272: if (rvalue < 0)
-: 1273: {
-: 1274: /* Ignore the destinations role as a
destination. Still,
-: 1275: we have to consider input registers
embedded in the addresses
-: 1276: of a MEM. Some special forms also make
the entire destination
-: 1277: a source. */
4647110: 1278: enum rtx_code dst_code = GET_CODE (SET_DEST
(x));
-: 1279:
4647110: 1280: if ((dst_code != STRICT_LOW_PART
-: 1281: && dst_code != ZERO_EXTEND
-: 1282: && dst_code != SIGN_EXTEND)
-: 1283: ? !struct_equiv_dst_mem (SET_DEST (x),
SET_DEST (y), info)
-: 1284: : !struct_equiv (&SET_DEST (x),
SET_DEST (y), 1, info))
16497: 1285: return false;
-: 1286: }
#####: 1287: else if (!struct_equiv (&SET_DEST (x), SET_DEST
(y), 0, info))
#####: 1288: return false;
-: 1289: /* Process source. */
4630613: 1290: return struct_equiv (&SET_SRC (x), SET_SRC (y),
1, info);
-: 1291: }
-: 1292: case CLOBBER:
1904: 1293: return rvalue < 0 || struct_equiv (&XEXP (x, 0),
XEXP (y, 0), 0, info);
It would seem that when called for the toplevel pattern, rvalue is
always -1, so we shouldn't get a SET or CLOBBER with a different rvalue.
Have the code gcc_assert that and lose the dead code.
One general comment: You're matching entire pieces of RTL. Wouldn't
it be possible/simpler to match insn_codes, extract the operands, and
then just look at those? I have in mind something that looks a bit
like parts of regrename.
It wouldn't be simpler in code, since we have to match arbitrary rtl for
operands and notes.
True, but I was thinking we can tell quickly which operands are inputs
and which ones are outputs and handle them accordingly.
I won't insist, I just thought it would be kind of nice to have
similar-looking algorithms.
Even if we spent lots of time
to make genrecog emit a an optimized parser like bison or byacc do -
which I don't see that happening anytime soon - I
would still expect it to be slower than matching two rtl structures to
compare them.
recog_memoized doesn't take long if we have an insn_code already (we
seem to, at this point in the compilation). insn_extract should also be
rather quick.
At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
Has anything changed recently in that department?
We could also fail to merge memory attributes where the match_operand is
of for the memory address inside.
Do any ports generate insns with different MEM attributes outside
operands? That sounds rather broken.
And finally, we could loose some optimizations when separate insn
patterns for registers and/or constants are used, and the ease of
implementing recognizing cases where constants will be used in
different-looking rtl than a variant using a register because of rtl
canonicalizations applied.
I was wondering about the use of the find_reg_equal_equiv_note - lots
of code seems to use that function, but do we really want to mess with
REG_EQUIV notes?
We have to avoid leaving REG_EQUIV or REG_EQUAL notes around that are
factually wrong - they can lead to incorrect code in later optimization
steps.
It is also not a good idea to willy-nilly remove notes when we are not
going to do any optimization at all, or if the notes are still correct -
we pay with missed rematerialization in reload for missing notes.
Like Stephen B., I'd like to see this moved to a separate file; then
you can also lose the struct_equiv_ prefix from most function names
and make them more meaningful.
struct-equiv.c ?
Seems plausible. Maybe functions like insns_match_p can move too, then
a name like "match-insns.c" might be better. The exact name is
uncritical though.
I suppose we can keep the header stuff in basic-block.h, it would be
confusing an error-prone if the STRUCT_EQUIV_START etc. numbers are in a
different file, but the same number sequence as the CLEANUP_* numbers.
Sure.
I don't think we should have multiple separate structures; that would
unnecessarily increase the argument passing.
Ok. The new struct you appended at the end looks fine.
> + bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
> + rtx x_local[STRUCT_EQUIV_MAX_LOCAL],
y_local[STRUCT_EQUIV_MAX_LOCAL];
Might want to turn this into an array of structs. Then we could also
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.
While x_local and y_local generally appear toghether, local_rvalue does
not. Having an array of a struct of two rtx and one bool also causes
inefficient use of space and awkward indexing. Is there any reason to
make MAX_LOCAL a runtime param? The algorithm is not designed to scale
well with a really large MAX_LOCAL.
ISTR there seems to be a general rule/guideline to make this kind of
thing runtime tunable with a --param. If we can do it easily we
probably should, otherwise let's not bother.
Just an assignment would be enough? Wouldn't like this to overflow,
and I don't think we need different grades of impossible.
Actually, we do.
Understood, thanks.
This number, multiplied with the larger of STRUCT_EQUIV_MAX_LOCAL and
FIRST_PSEUDO_REGISTER, must fit in the input_cost field of struct
equiv_info.
Might be helpful to add this.
> +/* Do only the struct_equiv SET_DEST processing for SETs and
CLOBBERs. */
> +static bool
> +struct_equiv_set (rtx x, rtx y, struct equiv_info *info)
Not too happy with some of these function names. "set_dest_equiv_p"?
Yes, that makes sense if it is in a separate file. Although it makes it
less obvious that it's a variation on the same theme as struct_equiv.
Maybe it is better to rename that into rtx_equiv_p then.
What other function names?
Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?
Should mention return value in comment.
"Return true fur success." ?
Plausible :)
The rest of the description could also be enlarged to describe the
necessary details of the algorithm to understand why we need this
function.
Obviously we could hand-inline this function,
We could, but that's beside the point. The point is that a simple
comment along the lines of "This is the first step in processing each
insn; we handle set destinations first because we're scanning backwards,
and need to terminate the lifetimes of anything that's set in this insn"
would help a new reader understand the algorithm more quickly.
so I suppose you mean why
we need to process set/clobber destinations separately from input
values. This is really something struct_equiv_block_eq does, so should
I put it into the pre-function comment there?
If you add a top-level overview at the start of the file, that's
probably a good place for this kind of description.
After the old entry is deleted, a new one is added, and the version
number increased. Thus, for this register enlargement, there will be a
version increment, but no net change of info->cur.local_count. Hence,
the while loop will fail to restore info.cur.version to p->version, and
info->need_rerun will be set. I.e. the register liveness information is
known
to be flawed, and another pass will be required if the partial match so
far otherwise looks promising.
Still trying to wrap my head around this - the REG handling in function
seems difficult; maybe more comments describing the whys of what we are
doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays
multiple times, along with a bit that records the change (live <->
dead)? We'd save the scan backwards, and we could always restore to a
previous checkpoint, which in turn means we no longer need the rerun
machinery? MAX_LOCAL probably needs to grow, but if we're not going to
loop over these arrays anymore, that's no problem.
> +{
> +#ifdef STACK_REGS
> + /* If cross_jump_death_matters is not 0, the insn's mode
> + indicates whether or not the insn contains any stack-like
regs. */
> +
> + if (INSN_P (i1)
> + && (info->mode & CLEANUP_POST_REGSTACK) &&
stack_regs_mentioned (i1))
Is this bit always set after regstack, or do we need an extra test
such as reload_completed?
AFAICS, there is no other pass that uses cleanup_cfg after regstack.
FWIW, this code is
stems basically from the old cross-jumping code.
But we're (going to) use it in one new place, ifcvt. Don't know if this
is going to be a problem.
If it's gonna have its own file, we might as well put a blurb at the top.
Yes.
/* Try to match two basic blocks - or their ends - for structural
equivalence.
We scan the blocks from their ends backwards. When we see mismatching
registers, we try to figure out if these could be registers local to
these (partial)
blocks, or input values.
The main entry point to this file is struct_equiv_block_eq. This
function uses a
struct equiv_info to accept some of its inputs, to keep track of its
internal state, to pass down
to its helper functions, and to communicate some of the results back
to the caller.
struct_equiv_block_eq has to be called first with the
STRUCT_EQUIV_START bit set in the
mode parameter. This will calculate the number of matched insns and
the number and types of
inputs. If the need_rerun field is set, the results are only
tentative, and the caller has to call
again with STRUCT_EQUIV_RERUN till need_rerun is false in order to get
a reliable match.
To install the changes necessary for the match, the function has to be
called again with
STRUCT_EQUIV_FINAL.
*/
Sounds good. There should also be a description of the processing done
on each insn (i.e., SET_DESTs first, then another pass for inputs).
> + if (info->x_start != x_stop) for (;;)
Formatting.
IIRC there were 80-char-per-line problems with adding two indentation
levels there.
Can't see anything that would be problematic.
> + /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++
cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move into
a separate function - unless the corresponding bits in insns_match_p
aren't actually needed anymore (gcov shows no coverage, but that may not
mean anything)?
Oh dear, this comment is even older than Jan's change - it was present
in the old jump.c cross jumper.
------------------------------------------------------------------------
+ /* The following are ORed in on top of the CLEANUp* flags in calls to
"CLEANUP_*"
[New structure definitions snipped]
Much better, thank you.
Bernd