This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Vladimir Makarov <vmakarov@redhat.com> writes:I add the explicit check of base + disp case. I also added a lot of asserts checking that we don't assign twice.Ah, OK.if that's accurate. I dropped the term "reload pseudo" because of the general comment in my earlier reply about the use of "reload pseudo" when the code seems to include inheritance and split pseudos too.There is no inheritance and splitting yet. It is done after the constraint pass. So at this stage >= new_regno_start means reload pseudo.
Thanks.That's a change in the meaning of NEW_CLASS, but seems easier for callers to handle. I think all it requires is changing:
to:+ common_class = ira_reg_class_subset[rclass][cl]; + if (new_class != NULL) + *new_class = common_class;
common_class = ira_reg_class_subset[rclass][cl]; if (new_class != NULL && rclass != common_class) *new_class = common_class;This change results in infinite LRA looping on a first libgcc file compilation. Unfortunately I have no time to investigate it. I'd like to say that most code of in this code is very sensitive to changes. I see it a lot. You change something looking obvious and a target is broken. I am going to investigate it when I have more time.
Fair enough. I still stand by:I saw it somewhere. I guess IA64.+ default: + { + const char *fmt = GET_RTX_FORMAT (code); + int i; + + if (GET_RTX_LENGTH (code) != 1 + || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC) + { + for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) + if (fmt[i] == 'e') + extract_loc_address_regs (false, mode, as, &XEXP (x, i), + context_p, code, SCRATCH, + modify_p, ad); + break; + } + /* fall through for case UNARY_OP (UNSPEC ...) */ + } + + case UNSPEC: + if (ad->disp_loc == NULL) + ad->disp_loc = loc; + else if (ad->base_reg_loc == NULL) + { + ad->base_reg_loc = loc; + ad->base_outer_code = outer_code; + ad->index_code = index_code; + ad->base_modify_p = modify_p; + } + else + { + lra_assert (ad->index_reg_loc == NULL); + ad->index_reg_loc = loc; + } + break; + + }Which targets use a bare UNSPEC as a displacement? I thought a displacement had to be a link-time constant, in which case it should satisfy CONSTANT_P. For UNSPECs, that means wrapping it in a CONST.I'm just a bit worried that the UNSPEC handling is sensitive to the order that subrtxes are processed (unlike PLUS, which goes to some trouble to work out what's what). It could be especially confusing because the default case processes operands in reverse order while PLUS processes them in forward order.
Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough? Probably deserves a comment.I don't remember. To figure out, I should switch it off and try all targets supported by LRA.AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just recording where reloads of a particular class need to go (obviously in the case of disp_loc, which isn't reloaded at all). The feidls have semantic value too. I.e. we use them to work out the value of at least part of the address.
In that case it seems dangerous to look through general rtxes in the way that the default case above does. Maybe just making sure that DISP_LOC is involved in a sum with the base would be enough, but another idea was:
---------------------------------------------------------------- I know of three ways of "mutating" (for want of a better word) an address:
1. (and X (const_int X)), to align 2. a subreg 3. a unary operator (such as truncation or extension)
So maybe we could:
a. remove outer mutations (using a helper function) b. handle LO_SUM, PRE_*, POST_*: as now c. otherwise treat the address of the sum of one, two or three pieces. c1. Peel mutations of all pieces. c2. Classify the pieces into base, index and displacement. This would be similar to the jousting code above, but hopefully easier because all three rtxes are to hand. E.g. we could do the base vs. index thing in a similar way to commutative_operand_precedence. c3. Record which pieces were mutated (e.g. using something like the index_loc vs. index_reg_loc distinction in the current code)
That should be general enough for current targets, but if it isn't, we could generalise it further when we know what generalisation is needed.
That's still going to be a fair amount of code, but hopefully not more, and we might have more confidence at each stage what each value is. And it avoids the risk of treating "mutated" addresses as "unmutated" ones. ----------------------------------------------------------------
Just an idea though. Probably not for 4.8, although I might try it if I find time.I am not sure that you listed all the cases. It would be great if you listed all the cases. In this case we could make this function more clear. I tried to do this first but permanently found new cases. After that I gave up and tried to use more general implementation.
This function was rewritten and modified many times. I am afraid to do this again when clock is ticking.
It would be great if you re-implement the function according to your ideas and we could try it on 8 targets to which LRA was already ported. An LRA sub-branch would a perfect place to do it
It would be nice to sort out the disp_loc thing for 4.8 though.though. My point is that base_plus_disp_to_reg assumes that *base_loc and *disp_loc are involved in a sum. It adds them together, replaces the base_loc with the new pseudo, and removes the disp_loc. But extract_address_regs seems to be deliberately written in a way that doesn't require base_loc and disp_loc to be involved in a sum, and uses a more indirect approach to working out disp_loc instead. It just feels like it leaves open the potential for a silent wrong-code bug.
Right. I don't think it is more dangerous than clobbers but I like what you proposed. After few tries, I managed with clobbers. So now there are no bound pseudos (or special reload pseudos. that is how it is called now). The code is clear and there is no additional notion for LRA. Thank you.Fixed.Maybe:+/* Reload pseudos created for matched input and output reloads whose + mode are different. Such pseudos has a modified rules for finding + their living ranges, e.g. assigning to subreg of such pseudo means + changing all pseudo value. */ +bitmap_head lra_bound_pseudos;
/* Reload pseudos created for matched input and output reloads whose modes are different. Such pseudos have different live ranges from other pseudos; e.g. any assignment to a subreg of these pseudos changes the whole pseudo's value. */Although that said, couldn't emit_move_insn_1 (called by gen_move_insn) split a multiword pseudo move into two word moves? Using the traditional clobber technique sounds better than having special liveness rules.
It is not only about multi-words pseudos. It is about representation of this situation by constructions semantically incorrect in order parts of compiler. Reload has no such problem as it does not use RTL. So I don't think it splits as I use emit_move_insn and that calls emit_move_insn_1 too.But my point is that emit_move_insn_1 _does_ split moves that have no .md pattern of their own. E.g. some targets do not define double-word move patterns because such moves are always equivalent to two individual word moves. And if emit_move_insn_1 splits:
(set (reg:DI X) (reg:DI Y))
into:
(set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0)) (set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))
then it would be to say that the subreg in the second instruction is a complete definition of X.I really needed a special liveness treatment (although I don't remember details) and therefore I added it. I had no detail design for LRA. The code was modified by numerous test failures on different targets. There is a lot of code analogous to reload one and probably its necessity should be rigorously questioned. I thought about and modified part of this code but unfortunately not all.
Also bound pseudos are rare. Their bitmap is very small and testing (2 lines of code in overall) them in ira-lives.c is fast.FWIW, It wasn't really speed as much as correctness I was worried about. In a way, rarity makes having special rules seem even more dangerous.
Sorry, I just composed this email for 2 days and that was a placeholder for your question I should answer or work on:) It looks I did not answered the question.Yeah, I think that makes it a lot clearer, thanks.I don't understand this comment, sorry.+ /* We create pseudo for out rtx because we always should keep + registers with the same original regno have synchronized + value (it is not true for out register but it will be + corrected by the next insn).
Pseudos have values -- see comments for lra_reg_info. Different pseudos with the same value do not conflict even if they live in the same place. When we create a pseudo we assign value of original pseudo (if any) from which we created the new pseudo. If we create the pseudo from the input pseudo, the new pseudo will no conflict with the input pseudo which is wrong when the input pseudo lives after the insn and as the new pseudo value is changed by the insn output. Therefore we create the new pseudo from the output.
I hope it is more understandable. I changed the comment.
OK, thanks.Again, inheritance and splitting is done after the constraint pass.+ /* In and out operand can be got from transformations before + processing constraints. So the pseudos might have inaccurate + class and we should make their classes more accurate. */ + narrow_reload_pseudo_class (in_rtx, goal_class); + narrow_reload_pseudo_class (out_rtx, goal_class);I don't understand this, sorry. Does "transformations" mean inheritance and reload splitting? So the registers we're changing here are inheritance and split pseudos rather than reload pseudos created for this instruction? If so, it sounds on face value like it conflicts with the comment quoted above about not allowing reload instructions to the narrow the class of pseudos. Might be worth saying why that's OK here but not there.
The transformations here are mostly reloading of subregs which is done before reloads for given insn. On this transformation we create new pseudos for which we don't know reg class yet. In case we don't know pseudo reg class yet, we assign ALL_REGS to the pseudo.
???Also, all uses but one of lra_get_hard_regno_and_offset follow the pattern:
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset); /* The real hard regno of the operand after the allocation. */ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
so couldn't lra_get_hard_regno_and_offset just return the final hard register, including elimination? Then it could apply the elimination on the original rtx.
FWIW, the exception I mentioned was operands_match_p:
lra_get_hard_regno_and_offset (x, &i, &offset); if (i < 0) goto slow; i += offset;
but I'm not sure why this is the only caller that would want to ignore elimination.
Fixed.Not sure what you meant here :-) Was that a placeholder, or something else? What I was getting at was that it would be nice to replace all occurences of:
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset); /* The real hard regno of the operand after the allocation. */ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
with something like:
x_hard_regno = lra_get_hard_regno (x);
and this operands_match_p code seemed to be the only place that didn't apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset. I wasn't really sure why operands_match_p was different.
No. Sorry. I'll think about this morning. A bit tired for today.Not sure: did you have any thoughts on this?+ int i, j, x_hard_regno, offset; + enum machine_mode mode; + rtx x; + const char *fmt; + enum rtx_code code; + + if (*loc == NULL_RTX) + return false; + x = *loc; + code = GET_CODE (x); + mode = GET_MODE (x); + if (code == SUBREG) + { + loc = &SUBREG_REG (x); + x = SUBREG_REG (x); + code = GET_CODE (x); + if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode)) + mode = GET_MODE (x); + } + + if (REG_P (x)) + { + lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset); + /* The real hard regno of the operand after the allocation. */ + x_hard_regno = get_final_hard_regno (x_hard_regno, offset); + return (x_hard_regno >= 0 + && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));With the subreg mode handling above, this looks little-endian specific. The MEM case:+ if (MEM_P (x)) + { + struct address ad; + enum machine_mode mode = GET_MODE (x); + rtx *addr_loc = &XEXP (x, 0); + + extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad); + if (ad.base_reg_loc != NULL) + { + if (uses_hard_regs_p (ad.base_reg_loc, set)) + return true; + } + if (ad.index_reg_loc != NULL) + { + if (uses_hard_regs_p (ad.index_reg_loc, set)) + return true; + } + }is independent of the subreg handling, so perhaps the paradoxical subreg case should be handled separately, using simplify_subreg_regno.
I think I added this for some purpose. It might be IRA assigning wrongly the same hard register in situation.But in that case I don't understand the condition. If we have:Ok. Fixed.Why not simply use operands m's early_clobber field?+ match_p = false; + if (operands_match_p (*curr_id->operand_loc[nop], + *curr_id->operand_loc[m], m_hregno)) + { + int i; + + for (i = 0; i < early_clobbered_regs_num; i++) + if (early_clobbered_nops[i] == m) + break; + /* We should reject matching of an early + clobber operand if the matching operand is + not dying in the insn. */ + if (i >= early_clobbered_regs_numI remember I saw such insn but I don't remember details.+ || operand_reg[nop] == NULL_RTX + || (find_regno_note (curr_insn, REG_DEAD, + REGNO (operand_reg[nop])) + != NULL_RTX)) + match_p = true;...although I don't really understand this condition. If the two operands are the same value X, then X must die here whatever the notes say. So I assume this is coping with a case where the operands are different but still match. If so, could you give an example?Yes, I know.Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't apply to the matched input operand; the earlyclobber only applies to other input operands. So I'd have expected it was those operands that might need reloading rather than this one.
E.g. if X occurs three times, twice in a matched earlyclobber pair and once as an independent operand, it's the latter operand that would need reloading.
(set (reg X) (... (reg X) ...))
(which is the kind of thing operands_match_p is testing for) then there is no requirement for a REG_DEAD note for X. But it's still OK for two Xs form an earlyclobber pair.
(set (reg X1) (... (reg X2) ...)) REG_UNUSED:X1) and X2 lives below.
It has sense. I fixed it using your variant.OK, that's probably true. :-)+ /* If we didn't already win, we can reload + constants via force_const_mem, and other + MEMs by reloading the address like for + 'o'. */ + if (CONST_POOL_OK_P (mode, op) || MEM_P (op)) + badop = false;It seems a bit inconsistent to treat a spilled pseudo whose address might well need reloading as a win, while not treating existing MEMs whose addresses need reloading as a win.Well, probability of reloading address of spilled pseudo is very small on most targets but reloading for MEM in this case is real. So I see it logical.
Sorry, my comment wasn't as clear as it should have been. I think:+ if (! no_regs_p) + reload_nregs + += ira_reg_class_max_nregs[this_alternative][mode];I wasn't sure why we counted this even in the "const_to_mem && constmmeok" and "MEM_P (op) && offmemok" cases from: /* We prefer to reload pseudos over reloading other things, since such reloads may be able to be eliminated later. So bump REJECT in other cases. Don't do this in the case where we are forcing a constant into memory and it will then win since we don't want to have a different alternative match then. */ if (! (REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER) && ! (const_to_mem && constmemok) /* We can reload the address instead of memory (so do not punish it). It is preferable to do to avoid cycling in some cases. */ && ! (MEM_P (op) && offmemok)) reject += 2;I think constmemok is obvious. It is not a reload, it just putting constant in the constant pool. We should not punish it as no additional insns are generated.
There is a comment for offmemok case. I think it describes it. Apparently it was a fix for LRA cycling. I don't remember details. To restore them, I need to remove the code and to try it on many targets. I guess, it would take 3-4 days. But I removed this as it does not affect x86/x86-64.
/* We prefer to reload pseudos over reloading other things, since such reloads may be able to be eliminated later. So bump REJECT in other cases. Don't do this in the case where we are forcing a constant into memory and it will then win since we don't want to have a different alternative match then. */ if (! (REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER) && ! (const_to_mem && constmemok) /* We can reload the address instead of memory (so do not punish it). It is preferable to do to avoid cycling in some cases. */ && ! (MEM_P (op) && offmemok)) reject += 2;
is fine. The point I was trying to make is that this code (correctly IMO) ignores register reload costs if the operand won't be reloaded into a register. The operand won't be reloaded into a register if:
(const_to_mem && constmemok) || (MEM_P (op) && offmemok)
However:
if (! no_regs_p) reload_nregs += ira_reg_class_max_nregs[this_alternative][mode];
seems to count the register reload cost regardless of whether the operand would be reloaded into a register. It felt like it ought to be something like:
if (!((const_to_mem && const_mem_ok) || (MEM_P (op) && offmemok))) { /* We prefer to reload pseudos over reloading other things, since such reloads may be able to be eliminated later. So bump REJECT in other cases. */ if (! (REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER) reject += 2;
if (! no_regs_p) reload_nregs += ira_reg_class_max_nregs[this_alternative][mode]; }
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |