This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
Thanks for all the updates.
Vladimir Makarov <vmakarov@redhat.com> writes:
>>> + /* index * scale + disp => new base + index * scale */
>>> + enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
>>> +
>>> + lra_assert (INDEX_REG_CLASS != NO_REGS);
>>> + new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
>>> + lra_assert (GET_CODE (*addr_loc) == PLUS);
>>> + lra_emit_move (new_reg, *ad.disp_loc);
>>> + if (CONSTANT_P (XEXP (*addr_loc, 1)))
>>> + XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
>>> + XEXP (*addr_loc, 0) = new_reg;
>> The canonical form is (plus (mult ...) (reg)) rather than
>> (plus (reg) (mult ...)), but it looks like we create the latter.
>> I realise you try both forms here:
> It might happen because equiv substitution in LRA.
>>> + /* Some targets like ARM, accept address operands in
>>> + specific order -- try exchange them if necessary. */
>>> + if (! valid_address_p (mode, *addr_loc, as))
>>> + {
>>> + exchange_plus_ops (*addr_loc);
>>> + if (! valid_address_p (mode, *addr_loc, as))
>>> + exchange_plus_ops (*addr_loc);
>>> + }
>> but I think we should try the canonical form first. And I'd prefer it
>> if we didn't try the other form at all, especially in 4.8. It isn't
>> really the backend's job to reject non-canonical rtl. This might well
>> be another case where some targets need a (hopefully small) tweak in
>> order to play by the rules.
>>
>> Also, I suppose this section of code feeds back to my question on
>> Wednesday about the distinction that LRA seems to make between the
>> compile-time constant in:
>>
>> (plus (reg X1) (const_int Y1))
>>
>> and the link-time constant in:
>>
>> (plus (reg X2) (symbol_ref Y2))
>>
>> It looked like extract_address_regs classified X1 as a base register and
>> X2 as an index register. The difference between the two constants has
>> no run-time significance though, and I think we should handle both X1
>> and X2 as base registers (as I think reload does).
>>
>> I think the path above would then be specific to scaled indices.
>> In the original address the "complex" index must come first and the
>> displacement second. In the modified address, the index would stay
>> first and the new base register would be second. More below.
> As I wrote above the problem is also in that equiv substitution can
> create non-canonical forms.
Right. Just in case there's a misunderstanding: I'm not complaining
about these routines internally using forms that are noncanonical
(which could happen because of equiv substitution, like you say).
I just think that what we eventually try to validate should be canonical.
In a way it's similar to how the simplify-rtx.c routines work.
If there are targets that only accept noncanonical rtl (which is after
all just a specific type of invalid rtl), they need to be fixed.
>>> + /* base + scale * index + disp => new base + scale * index */
>>> + new_reg = base_plus_disp_to_reg (mode, as, &ad);
>>> + *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
>>> + if (! valid_address_p (mode, *addr_loc, as))
>>> + {
>>> + /* Some targets like ARM, accept address operands in
>>> + specific order -- try exchange them if necessary. */
>>> + exchange_plus_ops (*addr_loc);
>>> + if (! valid_address_p (mode, *addr_loc, as))
>>> + exchange_plus_ops (*addr_loc);
>>> + }
>> Same comment as above about canonical rtl. Here we can have two
>> registers -- in which case the base should come first -- or a more
>> complex index -- in which case the index should come first.
>>
>> We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
>> with the operands in either order, and let it take care of the details.
>> Using simplify_gen_binary would help with the earlier index+disp case too.
> Equiv substitution can create non-canonical forms. There are 2 approaches:
> o have a code for dealing with non-canonical forms (equiv substitution,
> target stupidity)
> o always support canonical forms and require them from targets.
>
> I decided to use the 1st variant but I am reconsidering it. I'll try to
> fix before inclusion. But I am not sure I have time for this. All
> these changes makes LRA unstable. In fact, I've just found that changes
> I already made so far resulted in 2 SPEC2000 tests broken although GCC
> testsuite and bootstrap looks good.
OK. I'm happy to try fixing the noncanonical thing.
>>> + /* If this is post-increment, first copy the location to the reload reg. */
>>> + if (post && real_in != result)
>>> + emit_insn (gen_move_insn (result, real_in));
>> Nit, but real_in != result can never be true AIUI, and I was confused how
>> the code could be correct in that case. Maybe just remove it, or make
>> it an assert?
> No, it might be true:
>
> real_in = in == value ? incloc : in;
> ...
> if (cond)
> result = incloc;
> else
> result = ...
>
> if (post && real_in != result)
>
> So it is true if in==value && cond
Sorry, what I meant was that cond is "! post && REG_P (incloc)":
if (! post && REG_P (incloc))
result = incloc;
else
result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
"INC/DEC result");
so it can never be true in the "post" case quoted above.
>>> + dest_reg = SET_DEST (set);
>>> + /* The equivalence pseudo could be set up as SUBREG in a
>>> + case when it is a call restore insn in a mode
>>> + different from the pseudo mode. */
>>> + if (GET_CODE (dest_reg) == SUBREG)
>>> + dest_reg = SUBREG_REG (dest_reg);
>>> + if ((REG_P (dest_reg)
>>> + && (x = get_equiv_substitution (dest_reg)) != dest_reg
>>> + /* Remove insns which set up a pseudo whose value
>>> + can not be changed. Such insns might be not in
>>> + init_insns because we don't update equiv data
>>> + during insn transformations.
>>> +
>>> + As an example, let suppose that a pseudo got
>>> + hard register and on the 1st pass was not
>>> + changed to equivalent constant. We generate an
>>> + additional insn setting up the pseudo because of
>>> + secondary memory movement. Then the pseudo is
>>> + spilled and we use the equiv constant. In this
>>> + case we should remove the additional insn and
>>> + this insn is not init_insns list. */
>>> + && (! MEM_P (x) || MEM_READONLY_P (x)
>>> + || in_list_p (curr_insn,
>>> + ira_reg_equiv
>>> + [REGNO (dest_reg)].init_insns)))
>> This is probably a stupid question, sorry, but when do we ever want
>> to keep an assignment to a substituted pseudo? I.e. why isn't this just:
>>
>> if ((REG_P (dest_reg)
>> && (x = get_equiv_substitution (dest_reg)) != dest_reg)
> Equivalence can be memory location. It means you can assign many
> different values to the location. Removing them would generate a wrong
> code. For example, if you use your simple variant of code, you will
> have > 100 test failures of GCC testsuite.
OK :-)
>>> +/* Return true if we need a split for hard register REGNO or pseudo
>>> + REGNO which was assigned to a hard register.
>>> + POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
>>> + used for reloads since the EBB end. It is an approximation of the
>>> + used hard registers in the split range. The exact value would
>>> + require expensive calculations. If we were aggressive with
>>> + splitting because of the approximation, the split pseudo will save
>>> + the same hard register assignment and will be removed in the undo
>>> + pass. We still need the approximation because too aggressive
>>> + splitting would result in too inaccurate cost calculation in the
>>> + assignment pass because of too many generated moves which will be
>>> + probably removed in the undo pass. */
>>> +static inline bool
>>> +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
>>> +{
>>> + int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
>>> +
>>> + lra_assert (hard_regno >= 0);
>>> + return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
>>> + && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
>>> + && (usage_insns[regno].reloads_num
>>> + + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
>>> + && ((regno < FIRST_PSEUDO_REGISTER
>>> + && ! bitmap_bit_p (&ebb_global_regs, regno))
>>> + || (regno >= FIRST_PSEUDO_REGISTER
>>> + && lra_reg_info[regno].nrefs > 3
>>> + && bitmap_bit_p (&ebb_global_regs, regno))))
>>> + || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
>>> +}
>> Could you add more commentary about the thinking behind this particular
>> choice of heuristic? E.g. I wasn't sure what the reloads_num check did,
>> or why we only split hard registers that are local to the EBB and only
>> split pseudos that aren't.
>>
>> The 2 and 3 numbers seemed a bit magic too. I suppose the 2 has
>> something to do with "one save and one restore", but I wasn't sure
>> why we applied it only for pseudos. (AIUI that arm of the check
>> deals with "genuine" split pseudos rather than call saves & restores.)
>>
>> Still, it says a lot for the high quality of LRA that, out of all the
>> 1000s of lines of code I've read so far, this is the only part that
>> didn't seem to have an intuitive justification.
> Yes, right. I checked many parameters and finally picked out the above
> ones. What I found is that aggressive and even moderate splitting
> usually result in worse code. Most splitting is undone and if we do
> splitting aggressively we generate a lot of insns which will be removed
> but their costs are taken into account during assignment pass.
> Inheritance has quite bigger chance to successful. One reason why
> splitting is undone is that spilling + inheritance of short living
> pseudos is a substitution of splitting in some way. Therefore I do
> splitting for long living pseudos. That means non local pseudos.
>
> I added the following comments.
>
> return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
> && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
> /* We need at least 2 reloads to make pseudo splitting
> profitable. We should provide hard regno splitting in
> any case to solve 1st insn scheduling problem when
> moving hard register definition up might result in
> impossibility to find hard register for reload pseudo of
> small register class. */
> && (usage_insns[regno].reloads_num
> + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
> && (regno < FIRST_PSEUDO_REGISTER
> /* For short living pseudos, spilling + inheritance can
> be considered a substitution for splitting.
> Therefore we do not splitting for local pseudos. It
> decreases also aggressiveness of splitting. The
> minimal number of references is chosen taking into
> account that for 2 references splitting has no sense
> as we can just spill the pseudo. */
> || (regno >= FIRST_PSEUDO_REGISTER
> && lra_reg_info[regno].nrefs > 3
> && bitmap_bit_p (&ebb_global_regs, regno))))
>
> I removed checking that hard_regno is local. It was a parameter to
> decrease aggressiveness of hard register splitting but I realized that
> it contradicts the reason for splitting to solve 1st insn scheduling
> problem because regional scheduling can move hard register definition
> beyond BB.
Thanks, this looks much better to me FWIW.
Richard