This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend


> FYI, all other targets that have LRA optionally selectable or deselectable
> use -mno-lra for this (even when -mlra is the default), it would be better
> for consistency not to invent new switch names for that.

Agreed.

>> -    return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>> +    return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
>>  
>>    return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
>>  }
> Not sure about this one.  We would need to update the comment that
> explains why "!strict_p" is there, but AFAIK reason (1) would still apply.
>
> Was this needed for correctness or because it gave better code?

"!strict_p" has been removed because of correctness issue. When LRA validates 
memory addresses pseudos are temporarily eliminated to hard registers (if possible)
but the target hook is always called as non-strict. This only affects MIPS16 instructions with
not directly accessible $sp. The strict variant, as I understand, was used in the reload
pass to indicate if a pseudo-register has been allocated a hard register. Unless LRA
should be setting the strict/non-strict depending on whether a temporal elimination
to hard reg was successful or there is something else that I missed? 

> Easier as:
>
>  if (TARGET_MIPS16
>      && TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
>    return 1;
>  return 0;

Indeed. 

>> +  M16F_REGS,			/* mips16 + frame */
>
> Constraints are supposed to be operating on real registers, after
> elimination, so it seems odd to include a fake register.  What went
> wrong with just M16_REGS?

Only the stack pointer has been added to M16_REGS. A number of patterns need to accept
it otherwise LRA inserts a lot of reloads and the code size goes up by about 10%. 
The change does have also a positive effect on reload but marginally.
"frame" meant to indicate inclusion of both the stack and hard frame pointers in the class
but perhaps I should name it differently to avoid confusion.

>> +  SPILL_REGS,			/* All but $sp and call preserved regs are in here */
>...
>> +  { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 },	/* SPILL_REGS */      	\
>
> These two don't seem to match.  I think literally it would be 0x0300fffc,
> but maybe you had to make SPILL_REGS a superset of M16_REGs?

I initially used 0x0300fffc but did some experiments and it turned out that 0x0003fffc (with $16, $17 regs)
gives slightly better code. I haven't updated the comment though. There is yet more to do 
and need to return to another thread with MIPS16 at some point as I found some limitations
of IRA/LRA to generate better code. $8-$15 are currently inaccessible as temporary storage
because these registers are marked as fixed (when optimizing for size) but leaving them
as fixed are better for the code size. I don't expect a big gain by using hard registers
for spilling but it more likely to improve the performance.

>> +/* Add costs to hard registers based on frequency. This helps to negate
>> +   some of the reduced cost associated with argument registers which 
>> +   unfairly promotes their use and increases register pressure */
>> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO)                       \
>> +  (TARGET_MIPS16 && optimize_size                                       \
>> +   ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
>
> So we would be trying to use, say, $4 for the first incoming argument
> even after it had been spilled?  Hmm.
>
> Since this change is aimed specifically at one heuristic, I wonder
> whether we should parameterise that heuristic somehow rather than
> try to use a general hook to undo it.  But I don't think there's
> anything particularly special about MIPS16 here, so maybe it's too
> eager for all targets.

In a number of cases argument registers appeared to be unfairly promoted
increasing the register pressure and increasing the number of reloads.
Bumping up the cost of using those registers encourages IRA to spill
into memory but this appears to help LRA to do a better allocation. Of course,
not always it is a win but generally the gain outweighs the loss. 

I've seen an codesize improvements for other optimization levels
but I'm unsure whether we should make this change generic.
This part of the patch is not crucial though and can be send separately. 

>>  (define_insn "*and<mode>3_mips16"
>> ...
>
> I think we want to keep the LWU case at the very least, since I assume
> otherwise we'll load the full 64-bit spill slot and use a pair of shifts
> to zero-extend it.  Reloading the stack address into a base register
> should always be better than that.
>
> I agree it's less clear for the byte and halfword cases.  All other
> things -- including size -- being equal, reloading a stack address into
> a base register and using an extending load should be better than
> reloading the full spill slot and extending it, since the reloaded stack
> address is more likely to be reused in other instructions.
>
> The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
> I think using LBU and LHU there was probably a win.  I can imagine
> it making sense to disable LBU and LHU for MIPS16e though.
>
> It might be better to do any changes to this pattern as a follow-on
> patch, since I think LRA should cope with it as-is.

Keeping LWU case should be fine. Alternatives were removed because
LRA was ICEing for the same reason of $sp not accessible for the byte 
and halfword cases. I tried to find a testcase to trigger LWU case 
but couldn't. I presume it must be a rare case? The problem with other
alternatives was that spilled pseudos were changed to memory without
checking if the use of $sp is valid. On the other hand, keeping LWU 
may only delay triggering the problem because the stack pointer 
should not be used. It could be a missing case in LRA too.

> Minor nit, but please keep within 80 chars:
>
>  "!TARGET_MIPS16
>   && TARGET_EXPLICIT_RELOCS
>   && ABI_HAS_64BIT_SYMBOLS
>   && cse_not_expected"

Overlooked it.

>> diff --git gcc/rtlanal.c gcc/rtlanal.c
>> index 7a9efec..f699d17 100644
>> --- gcc/rtlanal.c
>> +++ gcc/rtlanal.c
>> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
>>      inner = strip_address_mutations (&XEXP (*inner, 0));
>>    if (REG_P (*inner)
>>        || MEM_P (*inner)
>> -      || GET_CODE (*inner) == SUBREG)
>> +      || GET_CODE (*inner) == SUBREG
>> +      || CONSTANT_P (*inner))
>>      return inner;
>>    return 0;
>>  }
> 
> This doesn't look right; the general form is base+index+displacement+segment,
> with the constant term being treated as the displacement.

I'm not particularly happy with this either. This was an attempt to fix an ICE for 
the following RTL (gcc.dg/torture/asm-subreg-1.c compiled with -mips32r2 -mips16 -O1):

(insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
            (mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
                                (const_int 0 [0])
                            ] UNSPEC_GP))
                    (symbol_ref:SI ("_const_32") [flags 0x6]  <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
 [(asm_input:HI ("X") (null):0)]
 [] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))

Any suggestions to handle this case?

Regards,
Robert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]