This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

From: Tamar Christina <Tamar dot Christina at arm dot com>
To: Alan Modra <amodra at gmail dot com>, Richard Biener <rguenther at suse dot de>
Cc: Peter Bergner <bergner at vnet dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, "law at redhat dot com" <law at redhat dot com>, "ian at airs dot com" <ian at airs dot com>
Date: Wed, 4 Apr 2018 09:06:25 +0000
Subject: RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
Nodisclaimer: True
References: <20171115155237.GA30502@arm.com> <alpine.LSU.2.20.1711160858100.12252@zhemvz.fhfr.qr> <20171116114254.GA11570@arm.com> <alpine.LSU.2.20.1711161254130.12252@zhemvz.fhfr.qr> <20171116152701.GA15229@arm.com> <20180329143525.GA3812@bubble.grove.modra.org> <64772721-85a9-96f7-62ef-53c7f31329ed@vnet.ibm.com> <alpine.LSU.2.20.1804031255190.18265@zhemvz.fhfr.qr> <20180403122505.GJ3812@bubble.grove.modra.org> <alpine.LSU.2.20.1804031429420.18265@zhemvz.fhfr.qr> <20180403131610.GK3812@bubble.grove.modra.org>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99


> -----Original Message-----
> From: Alan Modra <amodra@gmail.com>
> Sent: Tuesday, April 3, 2018 14:16
> To: Richard Biener <rguenther@suse.de>
> Cc: Peter Bergner <bergner@vnet.ibm.com>; Tamar Christina
> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> law@redhat.com; ian@airs.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Tue, Apr 03, 2018 at 02:30:23PM +0200, Richard Biener wrote:
> > On Tue, 3 Apr 2018, Alan Modra wrote:
> >
> > > On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > > > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > > >
> > > > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > > > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> > > > > >> --- a/gcc/expr.c
> > > > > >> +++ b/gcc/expr.c
> > > > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode
> mode,
> > > > > >> tree src)
> > > > > >>
> > > > > >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > > > >>    dst_words = XALLOCAVEC (rtx, n_regs);
> > > > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> BITS_PER_WORD);
> > > > > >> +  bitsize = BITS_PER_WORD;
> > > > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN
> (TREE_TYPE (src))))
> > > > > >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> + BITS_PER_WORD);
> > > > > >>
> > > > > >>    /* Copy the structure BITSIZE bits at a time.  */
> > > > > >>    for (bitpos = 0, xbitpos = padding_correction;
> > > > > >
> > > > > > I believe this patch is wrong.  Please revert.  See the PR84762
> testcase.
> > > > > >
> > > > > > There are two problems.  Firstly, if padding_correction is
> > > > > > non-zero, then xbitpos % BITS_PER_WORD is non-zero and in
> > > > > >
> > > > > >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > > > 		       0, 0, word_mode,
> > > > > > 		       extract_bit_field (src_word, bitsize,
> > > > > > 					  bitpos % BITS_PER_WORD, 1,
> > > > > > 					  NULL_RTX, word_mode,
> word_mode,
> > > > > > 					  false, NULL),
> > > > > > 		       false);
> > > > > >
> > > > > > the stored bit-field exceeds the destination register size.
> > > > > > You could fix that by making bitsize the gcd of bitsize and
> padding_correction.
> > > > > >
> > > > > > However, that doesn't fix the second problem which is that the
> > > > > > extracted bit-field can exceed the source size.  That will
> > > > > > result in rubbish being read into a register.
> > > > >
> > > > > FYI, I received an automated response saying Tamar is away on
> > > > > vacation with no return date specified.  That means he won't be
> > > > > able to revert the patch.  What do we do now?
> > > >
> > > > The code before the change already looks fishy to me.
> > >
> > > Yes, it is fishy so far as the code in the loop relies on alignment
> > > determining a small enough bitsize.  Adding
> > >
> > >   if (bytes % UNITS_PER_WORD != 0)
> > >     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) *
> > > BITS_PER_UNIT);
> > >
> > > after bitsize is calculated from alignment would make the code
> > > correct, I believe.  But I think that will fail the testcase Tamar
> > > added.
> > >
> > > >   x = expand_normal (src);
> > > >
> > > > so what do we expect this to expand to in general?  Fortunately it
> > > > seems there are exactly two callers so hopefully a gcc_assert
> > > > (MEM_P (x)) would work?
> > > >
> > > > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > > >
> > > > In case x is a MEM we should use MEM_ALIGN and if not then we
> > > > shouldn't do anything but just use word_mode here.
> > >
> > > No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> > > quoted above.
> >
> > Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN
> 
> Not that either.  We already have a byte count..
> 
> > otherwise
> > it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
> > incorrectly as well?
> 
> Yes, that case is mishandled too, even before Tamar's patch.  Peter's
> pr85123 testcase with vfloat1 aligned(8) is an example.
> 
> If we want correct code, then
> 
>   if (bytes % UNITS_PER_WORD != 0)
>     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> 

If bitsize is still initially calculated based on alignment, unless you over align it's
still just going to do 8 bit copies as it did before, which isn't an improvement at all.

Now that I know how the loads are done, I have a patch should be both correct and generate better code in most cases.
It just calculates bitsize inside the loop and does the copying in the largest mode possible that's equal or less than the bits
That are to be copied. This avoids the issue with reading too much, honors padding and alignment and still generates better code
In most cases.

I'm running the regression tests and should have a final version soon.

> will fix the algorithm inside the copy_blkmode_to_reg loop.  Otherwise the
> loop itself needs changes.
> 
> --
> Alan Modra
> Australia Development Lab, IBM

Follow-Ups:
- Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
  - From: Peter Bergner

References:
- Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
  - From: Richard Biener
- Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
  - From: Alan Modra
- Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
  - From: Richard Biener
- Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
  - From: Alan Modra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]