This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Repost: RFA [4.1]: improvement to if-conversion and cross-jumping (PR20070)


Bernd Schmidt wrote:

Why are we using true_regnum at all here, post reload? All regs should be hardregs at that point, and subregs of them should have been eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...

Because the cross-jumping code used reg_renumbered_equal_p, which also takes reg_renumber into account and looks into subregs.
I remember that we had REG_NOTES after reload that still referred to pseudos; I don't know if that can still happen now.
I suppose for 4.2 I can replace the renumbering code with gcc_assert (!reload_completed || regno < FIRST_PSEUDO_REGISTER) .
However, we have to be able to handle pseudos, and subregs of pseudos and hardregs, for this code to be any use for if-convert.


We might also do a bit of cross-jumping before reload in the future, e.g. when we find entire blocks that are equivalent.

-: 1269: case SET:

...


It would seem that when called for the toplevel pattern, rvalue is always -1, so we shouldn't get a SET or CLOBBER with a different rvalue. Have the code gcc_assert that and lose the dead code.

We can still be called for a PRE_MODIFY / POST_MODIFY.




At the moment, I think using recog
will be much slower. Have you run a debugging session recently figuring out why a pattern was not matched? The code
looks quite horrible.


Has anything changed recently in that department?

Not that I know of. I said 'recently' because of the human tendency to blur bad memories.



We could also fail to merge memory attributes where the match_operand is of for the memory address inside.


Do any ports generate insns with different MEM attributes outside operands? That sounds rather broken.

The mem attributes can be generated by machine-independent code. Often optimizer code just generates rtl and tries
if it matches; thus, a mem with attributes can easily be matched against a MEM that is a fixed part of the insn pattern.
These attributes might be irrelevant for the machine description, but meaningful for the alias analysis; hence the need to merge them.


struct-equiv.c ?


Seems plausible. Maybe functions like insns_match_p can move too, then a name like "match-insns.c" might be better. The exact name is uncritical though.

Yes, that's one good thing about subversion. However, the response times and the average command line lengths are still a lot longer than for cvs :-(



While x_local and y_local generally appear toghether, local_rvalue does not. Having an array of a struct of two rtx and one bool also causes inefficient use of space and awkward indexing. Is there any reason to make MAX_LOCAL a runtime param? The algorithm is not designed to scale well with a really large MAX_LOCAL.


ISTR there seems to be a general rule/guideline to make this kind of thing runtime tunable with a --param. If we can do it easily we probably should, otherwise let's not bother.

As far as I can tell, 16 registers is plenty for matching structurally equivalent code, i.e. having more would hardly increase code quality, and the overhead to search through 16 items is in the ballpark or less then what we'd spend with any elaborate data structure. --param is best when there is a hard trade-off between compilation time and run time, or different target processors or code bases have different optimum points for a tunable value. Since neither appears to be the case here, I think it would be unnecessary complexity to make this configurable.


Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?

That would be misleading. STRICT_LOW_PART / ZERO_EXTEND / SIGN_EXTEND is handled by the caller.
set_dest_addr_equiv_p ?


Still trying to wrap my head around this - the REG handling in function seems difficult; maybe more comments describing the whys of what we are doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays multiple times, along with a bit that records the change (live <-> dead)? We'd save the scan backwards, and we could always restore to a previous checkpoint, which in turn means we no longer need the rerun machinery? MAX_LOCAL probably needs to grow, but if we're not going to loop over these arrays anymore, that's no problem.

I don't think the rerun machinery is expensive. Let's face it, most of the time, this optimization won't succeed, and we'll be able to tell that straight ahead because we have matched too little instructions (or too little instructions for the number of input we have). Thus, actually having to re-run the scan will be much rarer than running the first pass.
Also, we can't add willy-nilly new mappings; if there is an existing, live one for one or both of the registers, we can only acept the new one if it is identical to or a consistent extension of the old mapping.
One possible way to avoid a search is when we find that the registers are not in x_local_live / y_local_live, set a flag for that, and if this flag is set and the registers are both either pseudo registers, or hard registers with hard_regno_nregs == 1, we know they have not been encountered before.
I have considered handling multiple lifetimes with varying pairings, or hard registers with hard_regno_nregs mismatch, but I suspect that's a lot of extra trouble for little or no gain.


But we're (going to) use it in one new place, ifcvt. Don't know if this is going to be a problem.

All the current ifcvt passes happen before reg-stack conversion. If we want another one after reg-stack, we can
set the CLEANUP_POST_REGSTACK bit appropriately.




> + /* The following code helps take care of G++ cleanups. */

Comment not helpful. How does it help? What's special about G++ cleanups?



I'm afraid that I have no idea. That stuff comes from the old cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted a number of times.


Ah, I see, it occurs somewhere else too. Common parts should move into a separate function - unless the corresponding bits in insns_match_p aren't actually needed anymore (gcov shows no coverage, but that may not mean anything)?

There are two remainig calls in outgoing_edges_match. One for tablejumps, and one for the case that we match none of three specific cases - which are:
- one outgoing edge for bb1
- two outgoing edges for bb1, and bb1 ends in a conditional jump without other sets. (EDGE_COUNT (bb1->succs) == 2 && any_condjump_p (BB_END (bb1)) && onlyjump_p (BB_END (bb1)))
- tablejump


Thus, it is not surprisong that you don't see any coverage for an average program. I don't think there is a good way to extract the common bits of insns_match_p and struct_equiv -
there is too much control flow interwoven with them. insns_match_p could be replaced with or using rtx_equiv_p (aka struct_equiv) and/or friends, but then it would need a struct equiv_info.
I suppose ideally we'd have try_crossjump_to_edge pass a pointer to its struct equiv_info into outgoing_edges_match, and if there were any local registers and/or inputs, take a copy of the struct_equiv for re-runs, and tell struct_equiv_block_eq not to do a full initialization. This would allow outgoing_edges_match to perform better for targets where there is a choise what register to use for the condition in a branch instruction. I could do some sh64-elf-sim testing, but I'm not sure that that would give sufficient coverage.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]