This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: LRA for x86/x86-64 [4/9]

From: Vladimir Makarov <vmakarov at redhat dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Tue, 02 Oct 2012 11:52:34 -0400
Subject: Re: RFC: LRA for x86/x86-64 [4/9]
References: <5064D9FC.7020704@redhat.com> <Pine.LNX.4.64.1209272304100.21459@digraph.polyomino.org.uk> <5064E425.6010800@redhat.com> <Pine.LNX.4.64.1209280007080.21459@digraph.polyomino.org.uk> <506500F4.6020707@redhat.com> <87lifqqb3l.fsf@talisman.home>

On 10/01/2012 02:51 PM, Richard Sandiford wrote:

Vladimir Makarov <vmakarov@redhat.com> writes:

+/* Return register bank of given hard regno for the current target.  */
+DEFHOOK
+(register_bank,
+ "A target hook which returns the register bank number to which the\
+  register @var{hard_regno} belongs to.  The smaller the number, the\
+  more preferable the hard register usage (when all other conditions are\
+  the same).  This hook can be used to prefer some hard register over\
+  others in LRA.  For example, some x86-64 register usage needs\
+  additional prefix which makes instructions longer.  The hook can\
+  return bigger bank number for such registers make them less favorable\
+  and as result making the generated code smaller.\
+  \
+  The default version of this target hook returns always zero.",
+ int, (int),
+ default_register_bank)

This is a horribly bikeshed-level comment, sorry, but I wonder if
something like "register_priority" would be better.  Register classes
are in some ways an extension of register banks, so it wasn't obvious
from the name why we needed both.

Ok. I agree that is not a good term. Register bank in hardware (especially in DSP) means a bit different thing.

Actually, on the Cauldron Ian asked me why it is different from register allocation order. I should say that the order usually takes caller-saves info into account. In x86-64, reg with REX flags can be caller-saved or not.

+/* Return true if maximal address displacement can be different.  */
+DEFHOOK
+(different_addr_displacement_p,
+ "A target hook which returns true if an address with the same structure\
+  can have different maximal legitimate displacement.  For example, the\
+  displacement can depend on memory mode or on operand combinations in\
+  the insn.\
+  \
+  The default version of this target hook returns always false.",
+ bool, (void),
+ default_different_addr_displacement_p)

If I read the patch correctly, this is only used in:

+	    if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
+	      lra_set_used_insn_alternative (insn, -1);

and so we keep the current alternative when neither spill_class_mode
nor different_addr_displacement_p is defined.  How many targets on the
LRA branch are like that?  I would have expected most targets with limited
address displacements would have to return true for the above hook,
because multiword loads and stores typically have to be split into word
loads and stores.  Same goes for strict-alignment targets, where wider
modes often have slightly lower maximal displacements.

E.g. for MIPS, SImode loads and stores have a displacement range of
[-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
So the maximal displacement depends on mode, even though the instruction set
is pretty regular.

Targets with full address-size displacements can use the default false return,
but it looks like the x86 port defines spill_class_mode instead, so AIUI
the value isn't really tested on Core i7.  What's the impact of that compared
to the other x86 targets that don't set X86_TUNE_GENERAL_REGS_SSE_SPILL?
Is LRA just quicker for them, or will it make different decisions
(compared to Core i7) even for non-SSE insns?

It is mostly done for the LRA speed. We could remove if-stmt and everything will be all right, only lra-constraints pass will go over all alternatives again.

Currently, there are two targets for which if-cond is true. One is x86-64 (more exactly when corei7 tune is used) and another target PARISC for which different_add_displacement_p is true.

It might be more in the future. Future ARMs and Powers might be profitable for spilling general registers into vector/floating point registers. Might be new other targets will need different_addr_displacement_p.

We could rid off the hook but it is important for speeding LRA up. I felt that I should make LRA speed competitive with reload even it is hard because LRA works on RTL and not on internal structures as reload. I see now I was right worrying about the speed. I guess if we could sacrifice 2% of compilation time, LRA code would be smaller and more clear. If we could sacrifice 10% percent compilation time, the code would be even smaller and clear because we could use DF-infrastructure.

+/* Determine class of registers which could be used for spilled
+   pseudos instead of memory.  */
+DEFHOOK
+(spill_class,
+ "This hook defines a class of registers which could be used for spilled pseudos\
+  of given class instead of memory",
+ reg_class_t, (reg_class_t),
+ NULL)

Should probably say that NO_REGS means "none".

+/* Determine mode for spilling pseudos into registers instead of memory.  */
+DEFHOOK
+(spill_class_mode,
+ "This hook defines mode in which a pseudo of given mode and of the first\
+  register class can be spilled into the second register class",
+ enum machine_mode, (reg_class_t, reg_class_t, enum machine_mode),
+ NULL)

It looks like the only use is in:

+	  || (targetm.spill_class_mode (rclass, spill_class,
+					PSEUDO_REGNO_MODE (regno))
+	      != PSEUDO_REGNO_MODE (regno))

So would it make sense to have a single hook like:

/* Determine mode for spilling pseudos into registers instead of memory.  */
DEFHOOK
(spill_class,
  "This hook defines a class of registers which could be used for spilling\
   pseudos of the given mode and class, or @code{NO_REGS} if only memory\
   should be used.  Not defining this hook is equivalent to returning\
   @code{NO_REGS} for all inputs."
  reg_class_t, (reg_class_t, enum machine_mode),
  NULL)

?  It means that setup_reg_spill_flag needs a class-x-mode walk, but
(bad excuse ahoy) we have plenty of those already.  If we really wanted
to avoid the extra loop, we could make VOIDmode mean "any mode",
although that does make the interface a bit more clunky.

No, I think it has sense. Too many hooks already. Although after removing reloads, some hooks will be gone.

Thanks for the proposals.

References:
- Re: RFC: LRA for x86/x86-64 [4/9]
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]