This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)


Joern RENNECKE wrote:
Bernd Schmidt wrote:
Can we postpone the ifcvt changes? They look quite straightforward, but it seems like they are independent of the rest. Once we've done the heavy lifting to get the first part in, we can revisit this.

That's a bit backwards with respect to the motivation of this patch, as the starting point was the if-conversion enhancement, but I am willing to split it up like this if you are prepared to review both parts.

I intend to; but regardless of that, splitting things up into smaller, logically independent parts will always help get things into the tree more quickly.


Couple of new comments:
Why are we using true_regnum at all here, post reload? All regs should be hardregs at that point, and subregs of them should have been eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...


-: 1230: /* Make sure we enter this pair in its renumbered form. */
259008: 1231: if (x_regno != REGNO (x))
#####: 1232: x = gen_rtx_REG (GET_MODE (x), x_regno);
259008: 1233: if (y_regno != REGNO (y))
#####: 1234: y = gen_rtx_REG (GET_MODE (y), y_regno);


... this bit appears to be dead code too. Speaking of which,

-: 1269: case SET:
-: 1270: {
-: 1271: /* Process destination. */
4647110: 1272: if (rvalue < 0)
-: 1273: {
-: 1274: /* Ignore the destinations role as a destination. Still,
-: 1275: we have to consider input registers embedded in the addresses
-: 1276: of a MEM. Some special forms also make the entire destination
-: 1277: a source. */
4647110: 1278: enum rtx_code dst_code = GET_CODE (SET_DEST (x));
-: 1279:
4647110: 1280: if ((dst_code != STRICT_LOW_PART
-: 1281: && dst_code != ZERO_EXTEND
-: 1282: && dst_code != SIGN_EXTEND)
-: 1283: ? !struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
-: 1284: : !struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
16497: 1285: return false;
-: 1286: }
#####: 1287: else if (!struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
#####: 1288: return false;
-: 1289: /* Process source. */
4630613: 1290: return struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info);
-: 1291: }
-: 1292: case CLOBBER:
1904: 1293: return rvalue < 0 || struct_equiv (&XEXP (x, 0), XEXP (y, 0), 0, info);


It would seem that when called for the toplevel pattern, rvalue is always -1, so we shouldn't get a SET or CLOBBER with a different rvalue. Have the code gcc_assert that and lose the dead code.

One general comment: You're matching entire pieces of RTL. Wouldn't it be possible/simpler to match insn_codes, extract the operands, and then just look at those? I have in mind something that looks a bit like parts of regrename.


It wouldn't be simpler in code, since we have to match arbitrary rtl for operands and notes.

True, but I was thinking we can tell quickly which operands are inputs and which ones are outputs and handle them accordingly.


I won't insist, I just thought it would be kind of nice to have similar-looking algorithms.

Even if we spent lots of time
to make genrecog emit a an optimized parser like bison or byacc do - which I don't see that happening anytime soon - I
would still expect it to be slower than matching two rtl structures to compare them.

recog_memoized doesn't take long if we have an insn_code already (we seem to, at this point in the compilation). insn_extract should also be rather quick.


At the moment, I think using recog
will be much slower. Have you run a debugging session recently figuring out why a pattern was not matched? The code
looks quite horrible.

Has anything changed recently in that department?


We could also fail to merge memory attributes where the match_operand is of for the memory address inside.

Do any ports generate insns with different MEM attributes outside operands? That sounds rather broken.


And finally, we could loose some optimizations when separate insn patterns for registers and/or constants are used, and the ease of
implementing recognizing cases where constants will be used in different-looking rtl than a variant using a register because of rtl
canonicalizations applied.



I was wondering about the use of the find_reg_equal_equiv_note - lots of code seems to use that function, but do we really want to mess with REG_EQUIV notes?


We have to avoid leaving REG_EQUIV or REG_EQUAL notes around that are factually wrong - they can lead to incorrect code in later optimization steps.
It is also not a good idea to willy-nilly remove notes when we are not going to do any optimization at all, or if the notes are still correct - we pay with missed rematerialization in reload for missing notes.



Like Stephen B., I'd like to see this moved to a separate file; then you can also lose the struct_equiv_ prefix from most function names and make them more meaningful.


struct-equiv.c ?

Seems plausible. Maybe functions like insns_match_p can move too, then a name like "match-insns.c" might be better. The exact name is uncritical though.


I suppose we can keep the header stuff in basic-block.h, it would be confusing an error-prone if the STRUCT_EQUIV_START etc. numbers are in a different file, but the same number sequence as the CLEANUP_* numbers.

Sure.


I don't think we should have multiple separate structures; that would unnecessarily increase the argument passing.

Ok. The new struct you appended at the end looks fine.


> + bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
> + rtx x_local[STRUCT_EQUIV_MAX_LOCAL], y_local[STRUCT_EQUIV_MAX_LOCAL];


Might want to turn this into an array of structs. Then we could also make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.


While x_local and y_local generally appear toghether, local_rvalue does not. Having an array of a struct of two rtx and one bool also causes inefficient use of space and awkward indexing. Is there any reason to make MAX_LOCAL a runtime param? The algorithm is not designed to scale well with a really large MAX_LOCAL.

ISTR there seems to be a general rule/guideline to make this kind of thing runtime tunable with a --param. If we can do it easily we probably should, otherwise let's not bother.



Just an assignment would be enough? Wouldn't like this to overflow, and I don't think we need different grades of impossible.


Actually, we do.

Understood, thanks.


This number, multiplied with the larger of STRUCT_EQUIV_MAX_LOCAL and
FIRST_PSEUDO_REGISTER, must fit in the input_cost field of struct equiv_info.

Might be helpful to add this.


> +/* Do only the struct_equiv SET_DEST processing for SETs and CLOBBERs. */
> +static bool
> +struct_equiv_set (rtx x, rtx y, struct equiv_info *info)


Not too happy with some of these function names. "set_dest_equiv_p"?


Yes, that makes sense if it is in a separate file. Although it makes it less obvious that it's a variation on the same theme as struct_equiv.
Maybe it is better to rename that into rtx_equiv_p then.
What other function names?

Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?


Should mention return value in comment.


"Return true fur success." ?

Plausible :)


The rest of the description could also be enlarged to describe the necessary details of the algorithm to understand why we need this function.


Obviously we could hand-inline this function,

We could, but that's beside the point. The point is that a simple comment along the lines of "This is the first step in processing each insn; we handle set destinations first because we're scanning backwards, and need to terminate the lifetimes of anything that's set in this insn" would help a new reader understand the algorithm more quickly.


so I suppose you mean why we need to process set/clobber destinations separately from input values. This is really something struct_equiv_block_eq does, so should I put it into the pre-function comment there?

If you add a top-level overview at the start of the file, that's probably a good place for this kind of description.


After the old entry is deleted, a new one is added, and the version number increased. Thus, for this register enlargement, there will be a version increment, but no net change of info->cur.local_count. Hence, the while loop will fail to restore info.cur.version to p->version, and info->need_rerun will be set. I.e. the register liveness information is known
to be flawed, and another pass will be required if the partial match so far otherwise looks promising.

Still trying to wrap my head around this - the REG handling in function seems difficult; maybe more comments describing the whys of what we are doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays multiple times, along with a bit that records the change (live <-> dead)? We'd save the scan backwards, and we could always restore to a previous checkpoint, which in turn means we no longer need the rerun machinery? MAX_LOCAL probably needs to grow, but if we're not going to loop over these arrays anymore, that's no problem.


> +{
> +#ifdef STACK_REGS
> + /* If cross_jump_death_matters is not 0, the insn's mode
> + indicates whether or not the insn contains any stack-like regs. */
> +
> + if (INSN_P (i1)
> + && (info->mode & CLEANUP_POST_REGSTACK) && stack_regs_mentioned (i1))


Is this bit always set after regstack, or do we need an extra test such as reload_completed?


AFAICS, there is no other pass that uses cleanup_cfg after regstack. FWIW, this code is
stems basically from the old cross-jumping code.

But we're (going to) use it in one new place, ifcvt. Don't know if this is going to be a problem.


If it's gonna have its own file, we might as well put a blurb at the top.

Yes.


/* Try to match two basic blocks - or their ends - for structural equivalence.
We scan the blocks from their ends backwards. When we see mismatching
registers, we try to figure out if these could be registers local to these (partial)
blocks, or input values.
The main entry point to this file is struct_equiv_block_eq. This function uses a
struct equiv_info to accept some of its inputs, to keep track of its internal state, to pass down
to its helper functions, and to communicate some of the results back to the caller.
struct_equiv_block_eq has to be called first with the STRUCT_EQUIV_START bit set in the
mode parameter. This will calculate the number of matched insns and the number and types of
inputs. If the need_rerun field is set, the results are only tentative, and the caller has to call
again with STRUCT_EQUIV_RERUN till need_rerun is false in order to get a reliable match.
To install the changes necessary for the match, the function has to be called again with
STRUCT_EQUIV_FINAL.
*/

Sounds good. There should also be a description of the processing done on each insn (i.e., SET_DESTs first, then another pass for inputs).


> + if (info->x_start != x_stop) for (;;)

Formatting.

IIRC there were 80-char-per-line problems with adding two indentation levels there.

Can't see anything that would be problematic.


> + /* The following code helps take care of G++ cleanups. */

Comment not helpful. How does it help? What's special about G++ cleanups?


I'm afraid that I have no idea. That stuff comes from the old cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted a number of times.

Ah, I see, it occurs somewhere else too. Common parts should move into a separate function - unless the corresponding bits in insns_match_p aren't actually needed anymore (gcov shows no coverage, but that may not mean anything)?


Oh dear, this comment is even older than Jan's change - it was present in the old jump.c cross jumper.

------------------------------------------------------------------------

+ /* The following are ORed in on top of the CLEANUp* flags in calls to

"CLEANUP_*"


[New structure definitions snipped]

Much better, thank you.


Bernd



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]