This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)


Tejas Belagod wrote:
> Tejas Belagod wrote:
> > This is because offsettable_address_addr_space_p () gets as far as calling
> > strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which
> > returns true. The only way I see offsettable_address_addr_space_p () returning
> > false would be mode_dependent_address_p () to return true for addr expression
> > (PLUS (reg) (16)) which partly makes sense to me because PLUS is a
> > mode-dependent address in that it cannot be allowed for NEON addressing modes,
> > but it seems very generic for mode_dependent_address_p () to return true for
> > PLUS in general instead of making a special case for vector modes. This
> > distinction cannot be made in the target's mode_dependent_address_p() as 'mode'
> > is not supplied to it.
> 
> I dug a little deeper into offsettable_address_addr_space_p () and realized that
> it only gets reg_equiv_mem () which is MEM:OI (reg sp) to work with which does
> not include the SUBREG_BYTE, therefore mode_dependent_address_p () does not have
> PLUS to check for address tree-mode dependency.

Sorry for the late reply, it took me a while to understand what's
really going on here.

I now agree that this is definitely a bug in reload; it's clear that
offsettable_memref_p does not and cannot catch this case.  In fact,
it does not even have enough information to answer the question in
any sensible way (except for just about always returning "no", which
wouldn't really be useful).

I also agree with the general approach in your patch.  The basic idea
is that it makes no sense to ask a generic question like "would it
be valid to add some (unknown) offset and change to some (unknown)
mode?", when instead we can ask quite specifically for the *known*
offset and mode we want to change to.  However, I'd prefer this to
go even further than what your patch does: I think we should not
be querying "offsettable_memref_p" *at all* here.

With your patch, you still call offsettable_memref_p on the address
that has already been offset -- asking for more offsets seems pointless.
Also, there is another call to offsettable_memref_p *within*
find_reloads_subreg_address --which gets used when FORCE_REPLACE is
false-- which suffers from the same problem as the call in
find_reloads_toplev, and likewise ought to be eliminated.


Taking a step back, let's look at the ways a (subreg (reg)) where
reg is a pseudo equivalent to a memory location can be handled:

- The "default" way would be to reload the inner reg in its proper
  mode from its equivalent memory location into a reload register,
  and use a subreg of that register as the operand.  This always
  works correcly, but sometime causes unnecessary reloads, if the
  insn could actually handle a memory operand directly.

- To avoid that unnecessary reload, we can instead attempt to replace
  the whole subreg with a modified (usually narrowed or widended)
  memory reference.  This can then be either used directly in the insn,
  or itself be reloaded.

In the second case (outer reload), there can be a number of issues:

- We may not be allowed at all to change the memory access if:

  * we have a paradoxical subreg that is implictly handled as a
    LOAD_EXTEND or ZERO_EXTEND due to LOAD_EXTEND_OP

  * we have a normal subreg that is implicitly acting on the full
    register due to WORD_REGISTER_OPERATIONS  (the check for this
    seems to be incomplete in current code!)

  * the equivalent memory address is mode-dependent

  * we have a paradoxical subreg, and we cannot prove the widened
    memory is properly aligned (so we may cause either a misaligned
    access, or access to unmapped memory)

- Even if we are in principle allowed to change the memory access,
  the modified address might not be valid (either because the
  original equivalent address is already invalid, or because it
  becomes invalid when adding an offset and/or changing its mode).
  We can still do the outer access in that case, but we will have
  to push address reloads (based on the modified address).

  Current code tries to be clever as to when to perform the
  substitution of the modified memory address: if it thinks no
  address reloads will be required in either case, it leaves the
  address as (subreg (reg)), allowing find_reloads to choose
  between (inner or outer) reloads or doing an (outer) access
  as memory operand directly.  In either case, the actual change
  to a mem happens in cleanup_subreg_operands at the end.

  On the other hand, if address reloads *are* required, it is
  find_reloads_toplev/find_reloads_subreg_address that will
  replace either the subreg or the reg with an explicit (outer
  or inner) memory access, and push the corresponding address
  reloads.  [ Note that find_reloads now no longer has the
  choice of switching between inner and outer access.  In the
  case of an outer access, there still is the choice between
  a direct memory access in the insn and a reload.  ]

- Even if we are allowed to change the memory access and generate
  correct address reloads, there sometimes is a question of whether
  outer access is actually preferable from a performance perspective
  to just doing the inner access.  In particular, if we have a
  paradoxical subreg, the outer access might cause more memory
  traffic than the inner access ...

  If the outer access can be done implicitly in the insn itself,
  that may still be preferable.  If we do a reload in either case,
  however, it may become questionable whether this is really better.

  Current code seems to make somewhat weird decisions on this
  particular point: if no address reloads are required, find_reloads
  will force a reload (i.e. not do a direct memory access) for nearly
  all paradoxical subregs:

   - where required due to LOAD_EXTEND_OP
   - always on big-endian targets
   - always on targets defining WORD_REGISTER_OPERATIONS
   - always unless the inner mode is aligned to BIGGEST_ALIGNMENT

  (Except for the first, I do not understand why any of these are
  actually necessary ...)

  push_reload will then nearly always reload the inner access.  So in
  effect, if no address reloads are required, for paradoxical subregs
  we will just about always do an inner access anyway.

  However, if address reloads *are* required, find_reloads_subreg_address
  goes to a significant amount of trouble to implement them correctly
  whenever possible by replacing the subreg with an outer MEM access.
  [ But if the target *defined* LOAD_EXTEND_OP, we will not do this for
  *any* paradoxical subreg, even those not actually affected.]

  This strikes me as odd: if we don't want to do outer accesses for
  paradoxical subreg when no address reloads are required (which is
  hopefully the vast majority of scenarios), why would we suddenly
  want them when address reloads are required?



Now, getting back to the original problem.  As discussed above, we don't
really want to rely on offsettable_memref_p.  Instead, we want to actually
generate the final memory reference, and simply check it for validity.

However, now the question becomes: as we have already generated the final
address, should we now just throw it away and have it get recomputed by
cleanup_subreg_operands?  The benefit would be to allow find_reloads the
choice between inner and outer access.  But is this actually needed?
For paradoxical subregs, we know that find_reloads would nearly always
choose to reload the inner access.  For normal subregs  --except where
prevented due to WORD_REGISTER_OPERATIONS-- find_reloads would always
choose to perform an outer access (directly or reloaded).  We can make
this distinction just as well already in find_reloads_subreg_address ...

This has the additional benefit that we can replace a bunch of (somewhat
dubious) hand-crafted code in find_reloads_subreg_address by a simple
call to simplify_subreg.

The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?

Bernd, Vlad, I'd appreciate your comments on this approach as well.

Thanks,
Ulrich


ChangeLog:

	* reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE
	parameter.  Always replace normal subreg with memory reference
	whenever possible.  Return NULL otherwise.
	(find_reloads_toplev): Always call find_reloads_subreg_address
	for subregs of registers equivalent to a memory location.
	Only recurse further if find_reloads_subreg_address fails.
	(find_reloads_address_1): Only call find_reloads_subreg_address
	for subregs of registers equivalent to a memory location.
	Properly handle failure of find_reloads_subreg_address.


Index: gcc/reload.c
===================================================================
*** gcc/reload.c	(revision 189809)
--- gcc/reload.c	(working copy)
*************** static int find_reloads_address_1 (enum 
*** 282,288 ****
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
  				       enum machine_mode, int,
  				       enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type,
  					int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
--- 282,288 ----
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
  				       enum machine_mode, int,
  				       enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, enum reload_type,
  					int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
*************** find_reloads_toplev (rtx x, int opnum, e
*** 4755,4785 ****
  	}
  
        /* If the subreg contains a reg that will be converted to a mem,
! 	 convert the subreg to a narrower memref now.
! 	 Otherwise, we would get (subreg (mem ...) ...),
! 	 which would force reload of the mem.
! 
! 	 We also need to do this if there is an equivalent MEM that is
! 	 not offsettable.  In that case, alter_subreg would produce an
! 	 invalid address on big-endian machines.
! 
! 	 For machines that extend byte loads, we must not reload using
! 	 a wider mode if we have a paradoxical SUBREG.  find_reloads will
! 	 force a reload in that case.  So we should not do anything here.  */
  
        if (regno >= FIRST_PSEUDO_REGISTER
! #ifdef LOAD_EXTEND_OP
! 	  && !paradoxical_subreg_p (x)
! #endif
! 	  && (reg_equiv_address (regno) != 0
! 	      || (reg_equiv_mem (regno) != 0
! 		  && (! strict_memory_address_addr_space_p
! 		      (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
! 		       MEM_ADDR_SPACE (reg_equiv_mem (regno)))
! 		      || ! offsettable_memref_p (reg_equiv_mem (regno))
! 		      || num_not_at_initial_offset))))
! 	x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
! 					   insn, address_reloaded);
      }
  
    for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
--- 4755,4773 ----
  	}
  
        /* If the subreg contains a reg that will be converted to a mem,
! 	 attempt to convert the whole subreg to a (narrower or wider)
! 	 memory reference instead.  If this succeeds, we're done --
! 	 otherwise fall through to check whether the inner reg still
! 	 needs address reloads anyway.  */
  
        if (regno >= FIRST_PSEUDO_REGISTER
! 	  && reg_equiv_memory_loc (regno) != 0)
! 	{
! 	  tem = find_reloads_subreg_address (x, opnum, type, ind_levels,
! 					     insn, address_reloaded);
! 	  if (tem)
! 	    return tem;
! 	}
      }
  
    for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
*************** find_reloads_address_1 (enum machine_mod
*** 6018,6029 ****
  	      if (ira_reg_class_max_nregs [rclass][GET_MODE (SUBREG_REG (x))]
  		  > reg_class_size[(int) rclass])
  		{
! 		  x = find_reloads_subreg_address (x, 0, opnum,
! 						   ADDR_TYPE (type),
! 						   ind_levels, insn, NULL);
! 		  push_reload (x, NULL_RTX, loc, (rtx*) 0, rclass,
! 			       GET_MODE (x), VOIDmode, 0, 0, opnum, type);
! 		  return 1;
  		}
  	    }
  	}
--- 6006,6036 ----
  	      if (ira_reg_class_max_nregs [rclass][GET_MODE (SUBREG_REG (x))]
  		  > reg_class_size[(int) rclass])
  		{
! 		  /* If the inner register will be replaced by a memory
! 		     reference, we can do this only if we can replace the
! 		     whole subreg by a (narrower) memory reference.  If
! 		     this is not possible, fall through and reload just
! 		     the inner register (including address reloads).  */
! 		  if (reg_equiv_memory_loc (REGNO (SUBREG_REG (x))) != 0)
! 		    {
! 		      rtx tem = find_reloads_subreg_address (x, opnum,
! 							     ADDR_TYPE (type),
! 							     ind_levels, insn,
! 							     NULL);
! 		      if (tem)
! 			{
! 			  push_reload (tem, NULL_RTX, loc, (rtx*) 0, rclass,
! 				       GET_MODE (tem), VOIDmode, 0, 0,
! 				       opnum, type);
! 			  return 1;
! 			}
! 		    }
! 		  else
! 		    {
! 		      push_reload (x, NULL_RTX, loc, (rtx*) 0, rclass,
! 				   GET_MODE (x), VOIDmode, 0, 0, opnum, type);
! 		      return 1;
! 		    }
  		}
  	    }
  	}
*************** find_reloads_address_part (rtx x, rtx *l
*** 6100,6116 ****
  }
  
  /* X, a subreg of a pseudo, is a part of an address that needs to be
!    reloaded.
! 
!    If the pseudo is equivalent to a memory location that cannot be directly
!    addressed, make the necessary address reloads.
  
!    If address reloads have been necessary, or if the address is changed
!    by register elimination, return the rtx of the memory location;
!    otherwise, return X.
! 
!    If FORCE_REPLACE is nonzero, unconditionally replace the subreg with the
!    memory location.
  
     OPNUM and TYPE identify the purpose of the reload.
  
--- 6107,6118 ----
  }
  
  /* X, a subreg of a pseudo, is a part of an address that needs to be
!    reloaded, and the pseusdo is equivalent to a memory location.
  
!    Attempt to replace the whole subreg by a (possibly narrower or wider)
!    memory reference.  If this is possible, return this new memory
!    reference, and push all required address reloads.  Otherwise,
!    return NULL.
  
     OPNUM and TYPE identify the purpose of the reload.
  
*************** find_reloads_address_part (rtx x, rtx *l
*** 6122,6252 ****
     stack slots.  */
  
  static rtx
! find_reloads_subreg_address (rtx x, int force_replace, int opnum,
! 			     enum reload_type type, int ind_levels, rtx insn,
! 			     int *address_reloaded)
  {
    int regno = REGNO (SUBREG_REG (x));
    int reloaded = 0;
  
!   if (reg_equiv_memory_loc (regno))
!     {
!       /* If the address is not directly addressable, or if the address is not
! 	 offsettable, then it must be replaced.  */
!       if (! force_replace
! 	  && (reg_equiv_address (regno)
! 	      || ! offsettable_memref_p (reg_equiv_mem (regno))))
! 	force_replace = 1;
! 
!       if (force_replace || num_not_at_initial_offset)
! 	{
! 	  rtx tem = make_memloc (SUBREG_REG (x), regno);
! 
! 	  /* If the address changes because of register elimination, then
! 	     it must be replaced.  */
! 	  if (force_replace
! 	      || ! rtx_equal_p (tem, reg_equiv_mem (regno)))
! 	    {
! 	      unsigned outer_size = GET_MODE_SIZE (GET_MODE (x));
! 	      unsigned inner_size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)));
! 	      int offset;
! 	      rtx orig = tem;
! 
! 	      /* For big-endian paradoxical subregs, SUBREG_BYTE does not
! 		 hold the correct (negative) byte offset.  */
! 	      if (BYTES_BIG_ENDIAN && outer_size > inner_size)
! 		offset = inner_size - outer_size;
! 	      else
! 		offset = SUBREG_BYTE (x);
  
! 	      XEXP (tem, 0) = plus_constant (GET_MODE (XEXP (tem, 0)),
! 					     XEXP (tem, 0), offset);
! 	      PUT_MODE (tem, GET_MODE (x));
! 	      if (MEM_OFFSET_KNOWN_P (tem))
! 		set_mem_offset (tem, MEM_OFFSET (tem) + offset);
! 	      if (MEM_SIZE_KNOWN_P (tem)
! 		  && MEM_SIZE (tem) != (HOST_WIDE_INT) outer_size)
! 		set_mem_size (tem, outer_size);
! 
! 	      /* If this was a paradoxical subreg that we replaced, the
! 		 resulting memory must be sufficiently aligned to allow
! 		 us to widen the mode of the memory.  */
! 	      if (outer_size > inner_size)
! 		{
! 		  rtx base;
  
! 		  base = XEXP (tem, 0);
! 		  if (GET_CODE (base) == PLUS)
! 		    {
! 		      if (CONST_INT_P (XEXP (base, 1))
! 			  && INTVAL (XEXP (base, 1)) % outer_size != 0)
! 			return x;
! 		      base = XEXP (base, 0);
! 		    }
! 		  if (!REG_P (base)
! 		      || (REGNO_POINTER_ALIGN (REGNO (base))
! 			  < outer_size * BITS_PER_UNIT))
! 		    return x;
! 		}
  
- 	      reloaded = find_reloads_address (GET_MODE (tem), &tem,
- 					       XEXP (tem, 0), &XEXP (tem, 0),
- 					       opnum, type, ind_levels, insn);
- 	      /* ??? Do we need to handle nonzero offsets somehow?  */
- 	      if (!offset && !rtx_equal_p (tem, orig))
- 		push_reg_equiv_alt_mem (regno, tem);
- 
- 	      /* For some processors an address may be valid in the
- 		 original mode but not in a smaller mode.  For
- 		 example, ARM accepts a scaled index register in
- 		 SImode but not in HImode.  Note that this is only
- 		 a problem if the address in reg_equiv_mem is already
- 		 invalid in the new mode; other cases would be fixed
- 		 by find_reloads_address as usual.
- 
- 		 ??? We attempt to handle such cases here by doing an
- 		 additional reload of the full address after the
- 		 usual processing by find_reloads_address.  Note that
- 		 this may not work in the general case, but it seems
- 		 to cover the cases where this situation currently
- 		 occurs.  A more general fix might be to reload the
- 		 *value* instead of the address, but this would not
- 		 be expected by the callers of this routine as-is.
- 
- 		 If find_reloads_address already completed replaced
- 		 the address, there is nothing further to do.  */
- 	      if (reloaded == 0
- 		  && reg_equiv_mem (regno) != 0
- 		  && !strict_memory_address_addr_space_p
- 			(GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
- 			 MEM_ADDR_SPACE (reg_equiv_mem (regno))))
- 		{
- 		  push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
- 			       base_reg_class (GET_MODE (tem),
- 					       MEM_ADDR_SPACE (tem),
- 					       MEM, SCRATCH),
- 			       GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0,
- 			       opnum, type);
- 		  reloaded = 1;
- 		}
- 	      /* If this is not a toplevel operand, find_reloads doesn't see
- 		 this substitution.  We have to emit a USE of the pseudo so
- 		 that delete_output_reload can see it.  */
- 	      if (replace_reloads && recog_data.operand[opnum] != x)
- 		/* We mark the USE with QImode so that we recognize it
- 		   as one that can be safely deleted at the end of
- 		   reload.  */
- 		PUT_MODE (emit_insn_before (gen_rtx_USE (VOIDmode,
- 							 SUBREG_REG (x)),
- 					    insn), QImode);
- 	      x = tem;
- 	    }
- 	}
-     }
    if (address_reloaded)
      *address_reloaded = reloaded;
  
!   return x;
  }
  
  /* Substitute into the current INSN the registers into which we have reloaded
--- 6124,6231 ----
     stack slots.  */
  
  static rtx
! find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
! 			     int ind_levels, rtx insn, int *address_reloaded)
  {
+   enum machine_mode outer_mode = GET_MODE (x);
+   enum machine_mode inner_mode = GET_MODE (SUBREG_REG (x));
+   unsigned outer_size = GET_MODE_SIZE (outer_mode);
+   unsigned inner_size = GET_MODE_SIZE (inner_mode);
    int regno = REGNO (SUBREG_REG (x));
    int reloaded = 0;
+   rtx tem, orig;
+   int offset;
  
!   gcc_assert (reg_equiv_memory_loc (regno) != 0);
  
!   /* We cannot replace the subreg with a modified memory reference if:
  
!      - we have a paradoxical subreg that implicitly acts as a zero or
!        sign extension operation due to LOAD_EXTEND_OP;
! 
!      - we have a subreg that is implicitly supposed to act on the full
!        register due to WORD_REGISTER_OPERATIONS (see also eliminate_regs);
! 
!      - the address of the equivalent memory location is mode-dependent;  or
! 
!      - we have a paradoxical subreg and the resulting memory is not
!        sufficiently aligned to allow access in the wider mode.
! 
!     In addition, we choose not to perform the replacement for *any*
!     paradoxical subreg, even if it were possible in principle.  This
!     is to avoid generating wider memory references than necessary.
! 
!     This corresponds to how previous versions of reload used to handle
!     paradoxical subregs where no address reload was required.  */
! 
!   if (paradoxical_subreg_p (x))
!     return NULL;
! 
! #ifdef WORD_REGISTER_OPERATIONS
!   if (outer_size < inner_size
!       && ((outer_size - 1) / UNITS_PER_WORD
!           == (inner_size - 1) / UNITS_PER_WORD))
!     return NULL;
! #endif
! 
!   /* Since we don't attempt to handle paradoxical subregs, we can just
!      call into simplify_subreg, which will handle all remaining checks
!      for us.  */
!   orig = make_memloc (SUBREG_REG (x), regno);
!   offset = SUBREG_BYTE (x);
!   tem = simplify_subreg (outer_mode, orig, inner_mode, offset);
!   if (!tem || !MEM_P (tem))
!     return NULL;
! 
!   /* Now push all required address reloads, if any.  */
!   reloaded = find_reloads_address (GET_MODE (tem), &tem,
! 				   XEXP (tem, 0), &XEXP (tem, 0),
! 				   opnum, type, ind_levels, insn);
!   /* ??? Do we need to handle nonzero offsets somehow?  */
!   if (!offset && !rtx_equal_p (tem, orig))
!     push_reg_equiv_alt_mem (regno, tem);
! 
!   /* For some processors an address may be valid in the original mode but
!      not in a smaller mode.  For example, ARM accepts a scaled index register
!      in SImode but not in HImode.  Note that this is only a problem if the
!      address in reg_equiv_mem is already invalid in the new mode; other
!      cases would be fixed by find_reloads_address as usual.
! 
!      ??? We attempt to handle such cases here by doing an additional reload
!      of the full address after the usual processing by find_reloads_address.
!      Note that this may not work in the general case, but it seems to cover
!      the cases where this situation currently occurs.  A more general fix
!      might be to reload the *value* instead of the address, but this would
!      not be expected by the callers of this routine as-is.
! 
!      If find_reloads_address already completed replaced the address, there
!      is nothing further to do.  */
!   if (reloaded == 0
!       && reg_equiv_mem (regno) != 0
!       && !strict_memory_address_addr_space_p
! 		(GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
! 		 MEM_ADDR_SPACE (reg_equiv_mem (regno))))
!     {
!       push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
! 		   base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
! 				   MEM, SCRATCH),
! 		   GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
!       reloaded = 1;
!     }
! 
!   /* If this is not a toplevel operand, find_reloads doesn't see this
!      substitution.  We have to emit a USE of the pseudo so that
!      delete_output_reload can see it.  */
!   if (replace_reloads && recog_data.operand[opnum] != x)
!     /* We mark the USE with QImode so that we recognize it as one that
!        can be safely deleted at the end of reload.  */
!     PUT_MODE (emit_insn_before (gen_rtx_USE (VOIDmode, SUBREG_REG (x)), insn),
! 	      QImode);
  
    if (address_reloaded)
      *address_reloaded = reloaded;
  
!   return tem;
  }
  
  /* Substitute into the current INSN the registers into which we have reloaded

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]