This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access


Hi James,

I have split off the aarch64 bit off from the generic parts and processed your feedback.

Attached is the reworked patch.

Ok for Tunk?

Thanks,
Tamar

Thanks,
Tamar

gcc/
2017-11-14  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/aarch64.c (aarch64_expand_movmem):
	Add MEM to REG optimized case.

gcc/testsuite/
2017-11-14  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/structs.c

> -----Original Message-----
> From: James Greenhalgh [mailto:james.greenhalgh@arm.com]
> Sent: Wednesday, August 30, 2017 16:56
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <richard.sandiford@linaro.org>; GCC Patches <gcc-
> patches@gcc.gnu.org>; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>
> Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for
> structure/block/unaligned memory access
> 
> Hi Tamar,
> 
> I think the AArch64 parts of this patch can be substantially simplified.
> 
> On Mon, Jul 03, 2017 at 07:17:51AM +0100, Tamar Christina wrote:
> > diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd0
> 37
> > 54bbba9264a6 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -13466,7 +13466,6 @@
> aarch64_copy_one_block_and_progress_pointers
> > (rtx *src, rtx *dst,
> >
> >  /* Expand movmem, as if from a __builtin_memcpy.  Return true if
> >     we succeed, otherwise return false.  */
> > -
> >  bool
> >  aarch64_expand_movmem (rtx *operands)  { @@ -13498,16 +13497,55
> @@
> > aarch64_expand_movmem (rtx *operands)
> >    base = copy_to_mode_reg (Pmode, XEXP (src, 0));
> >    src = adjust_automodify_address (src, VOIDmode, base, 0);
> >
> > +  /* Optimize routines for MEM to REG copies.
> > +     This particular function only handles copying to two
> > +     destination types: 1) a regular register and 2) the stack.
> > +     When writing to a regular register we may end up writting too much in
> cases
> > +     where the register already contains a live value or when no data
> padding is
> > +     happening so we disallow it.  */  if (n < 8 && !REG_P (XEXP
> > + (operands[0], 0)))
> 
> I don't understand how this comment lines up with the condition on this
> code path. On the one hand you say you permit regular registers, but on the
> other hand you say you disallow regular registers because you may overwrite.
> 
> 
> > +   {
> > +     machine_mode mov_mode, dest_mode
> > +       = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
> > +     rtx result = gen_reg_rtx (dest_mode);
> > +     emit_insn (gen_move_insn (result, GEN_INT (0)));
> > +
> > +     unsigned int shift_cnt = 0;
> 
> Any reason not to just initialise the shift_cnt in place here?
> 
> > +     for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode))
> 
> 
>      for (unsigned int shift_cnt = 0;
>           shift_cnt <=  n;
>           shift_cnt += GET_MODE_SIZE (mov_mode))
> 
> Having the loop counter first in the comparison is personal preference.
> 
> mov_mode looks uninitialised, but I guess by the time you check the loop
> condition you have actually initialized it.
> 
> > +       {
> > +	 int nearest = 0;
> 
> This isn't required, we could do the loop below with one
> 
> > +	 /* Find the mode to use.  */
> > +	 for (unsigned max = 1; max <= (n - shift_cnt); max *= 2)
> > +	      nearest = max;
> 
> Wrong formatting.
> 
> So, when this loop terminates for the first time (shift_cnt == 0) nearest is the
> first power of two greater than n.
> 
> > +
> > +	  mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT,
> > +MODE_INT);
> 
> That means this is always a mode of at least the right size, which in turn
> means that we can't execute round the loop again (GET_MODE_SIZE
> (mov_mode) will always be greater than n). So, we can be sure this loop only
> executes once, and we can fold mov_mode and dest_mode to be equal.
> 
> > +	  rtx reg = gen_reg_rtx (mov_mode);
> > +
> > +	  src = adjust_address (src, mov_mode, 0);
> > +	  emit_insn (gen_move_insn (reg, src));
> > +	  src = aarch64_progress_pointer (src);
> > +
> > +	  reg = gen_rtx_ZERO_EXTEND (dest_mode, reg);
> > +	  reg = gen_rtx_ASHIFT (dest_mode, reg,
> > +				GEN_INT (shift_cnt * BITS_PER_UNIT));
> > +	  result = gen_rtx_IOR (dest_mode, reg, result);
> > +       }
> > +
> > +    dst = adjust_address (dst, dest_mode, 0);
> > +    emit_insn (gen_move_insn (dst, result));
> > +    return true;
> > +  }
> > +
> >    /* Simple cases.  Copy 0-3 bytes, as (if applicable) a 2-byte, then a
> >       1-byte chunk.  */
> >    if (n < 4)
> >      {
> >        if (n >= 2)
> > -	{
> > -	  aarch64_copy_one_block_and_progress_pointers (&src, &dst,
> HImode);
> > -	  n -= 2;
> > -	}
> >
> > +      {
> > +	aarch64_copy_one_block_and_progress_pointers (&src, &dst,
> HImode);
> > +	n -= 2;
> > +      }
> 
> These formatting changes leave a blank newline between if (n >= 2) and the
> remainder of this block. Why?
> 
> >        if (n == 1)
> >  	aarch64_copy_one_block_and_progress_pointers (&src, &dst,
> QImode);
> >
> 
> Thanks,
> James

Attachment: memcpy-aarch64.patch
Description: memcpy-aarch64.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]