This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: dje dot gcc at gmail dot com, jakub at redhat dot com, rguenther at suse dot de, rdsandiford at googlemail dot com
- Date: Fri, 01 Nov 2013 14:42:48 -0500
- Subject: Re: [Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]
- Authentication-results: sourceware.org; auth=none
- References: <1383271579 dot 6275 dot 255 dot camel at gnopaine>
After discussing this for Richard S at some length today, I want to
withdraw this for now and re-examine the issue. I don't feel I
understand this as well as I thought I did... ;)
Thanks,
Bill
On Thu, 2013-10-31 at 21:06 -0500, Bill Schmidt wrote:
> Hi maintainers,
>
> I agree with David that duplicating this code is a bad approach. What
> he and I would both prefer is to add a target hook to account for an
> anomaly in the PowerPC architecture.
>
> Background: For historical reasons, our vperm instruction (which is
> produced for gen_vec_perm) has some peculiar semantics in little endian
> mode. The permute control vector is interpreted to contain elements
> indexed in big-endian order no matter which endian mode the processor is
> set to. We can work around this in little endian mode by inverting the
> permute control vector and swapping the order of the other two input
> vectors, thus obtaining the same semantics as we would get for big
> endian.
>
> This behavior works when the two vectors are being treated as a single
> double-wide vector; the swapping is needed to make the long array appear
> as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5].
>
> In the specific case of expand_mult_highpart (), the two vectors are not
> a single double-wide vector, but instead contain the odd and even
> results of widening multiplies. In this case, we still need to invert
> the permute control vector, but we don't want to swap the operands,
> because that causes us to mix up the odd and even results. This is the
> only such case we've run into where the swap is not what we need to
> obtain the right semantics.
>
> What we would like to do is swap the operands an extra time in
> expand_mult_highpart (), so that our common code will then swap the
> operands back to their original position. But since this is in
> target-independent code, we would need a target hook to do this.
> Something like:
>
> if (targetm.swap_vperm_inputs ())
> {
> rtx tmp = m1;
> m1 = m2;
> m2 = tmp;
> }
>
> For PowerPC, the target hook would return !BYTES_BIG_ENDIAN. The
> default implementation for all other targets would return false.
>
> Would you find such an approach tolerable?
>
> Thanks,
> Bill
> email message attachment (Re: [PATCH, rs6000] Correct handling of
> multiply high-part for little endian), "Forwarded message - Re:
> [PATCH, rs6000] Correct handling of multiply high-part for little
> endian"
> > -------- Forwarded Message --------
> > From: David Edelsohn <dje.gcc@gmail.com>
> > To: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH, rs6000] Correct handling of multiply high-part
> > for little endian
> > Date: Wed, 30 Oct 2013 20:06:37 -0400
> >
> > On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> > > Hi,
> > >
> > > When working around the peculiar little-endian semantics of the vperm
> > > instruction, our usual fix is to complement the permute control vector
> > > and swap the order of the two vector input operands, so that we get a
> > > double-wide vector in the proper order. We don't want to swap the
> > > operands when we are expanding a mult_highpart operation, however, as
> > > the two input operands are not to be interpreted as a double-wide
> > > vector. Instead they represent odd and even elements, and swapping the
> > > operands gets the odd and even elements reversed in the final result.
> > >
> > > The permute for this case is generated by target-neutral code in
> > > optabs.c: expand_mult_highpart (). We obviously can't change that code
> > > directly. However, we can redirect the logic from the "case 2" method
> > > to target-specific code by implementing expansions for the
> > > umul<mode>3_highpart and smul<mode>3_highpart operations. I've done
> > > this, with the expansions acting exactly as expand_mult_highpart does
> > > today, with the exception that it swaps the input operands to the call
> > > to expand_vec_perm when we are generating little-endian code. We will
> > > later swap them back to their original position in the code in rs6000.c:
> > > altivec_expand_vec_perm_const_le ().
> > >
> > > The change has no intended effect when generating big-endian code.
> > >
> > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
> > > regressions. This fixes the gcc.dg/vect/pr51581-4.c test failure for
> > > little endian. Ok for trunk?
> > >
> > > Thanks,
> > > Bill
> > >
> > >
> > > 2013-10-30 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> > >
> > > * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
> > > prototype.
> > > * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
> > > * config/rs6000/altivec.md (umul<mode>3_highpart): New.
> > > (smul_<mode>3_highpart): New.
> >
> > I really do not like duplicating this code. I think that you need to
> > explore with the community the possibility of including a hook in the
> > general code to handle the strangeness of PPC LE vector semantics.
> >
> > This is asking for problems if the generic code is updated / modified / fixed.
> >
> > - David
> >