This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: David Edelsohn <dje dot gcc at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 18 Oct 2013 08:51:49 -0500
- Subject: Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally
- Authentication-results: sourceware.org; auth=none
- References: <1382064197 dot 6275 dot 118 dot camel at gnopaine> <CAGWvnymxyk8LmKjHiFCckWZ7RV9wA23ROeQ5TW-1Jr26sQ+peQ at mail dot gmail dot com> <1382072985 dot 6275 dot 139 dot camel at gnopaine>
Just a quick note that overnight testing of the posted patch was clean.
Recap: There are three options on the table:
- the posted patch
- that patch with the (1 - start) change
- replace nunits - 1 with nunits as the loop upper bound
I'm happy to implement any of them, as you prefer. I lean towards the
last as the most easily understood.
Thanks,
Bill
On Fri, 2013-10-18 at 00:09 -0500, Bill Schmidt wrote:
>
> On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote:
> > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> > > Hi,
> > >
> > > In little endian mode, we managed to convert a load of the V4SI vector
> > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the
> > > number 7. It turns out we only looked at the first N-1 elements of an
> > > N-element vector in little endian mode, and verified the zeroth element
> > > twice. Adjusting the loop boundaries fixes the problem.
> > >
> > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu. Ok to
> > > commit to trunk if no regressions?
> >
> > This patch does not make sense. It biases the loop bounds based on
> > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable
> > based on BYTES_BIG_ENDIAN in one of the two uses. The changes seem to
> > compute the same value for the first use of index "i" for both
> > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way. It looks
> > like something should be able to be simplified.
> >
> > Thanks, David
> >
>
> It seems like it, but I haven't been able to figure out anything nicer.
> The thing that makes this weird is that "i" is referring to different
> elements of the array depending on Big or Little Endian numbering.
>
> Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with
> step = 2. Then for BE, the 15's are in odd-numbered elements (starting
> with zero from the left), but for LE they are in even-numbered elements
> (starting with zero from the right). For both cases, we found the
> candidate value in the rightmost element (nunits - 1 for BE, 0 for LE).
>
> The first two iterations for BE process the two leftmost elements. For
> i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val =
> 0. For i = 1, we find (i+1)&1 to be zero, so the desired value is val =
> 15, and so on.
>
> The first two iterations for LE process the second and third elements
> from the right. For i = 1, we find i&1 to be nonzero, so the desired
> value is 0. For i = 2, we find i&2 to be zero, so the desired value is
> 15.
>
> So even though the first iteration calculates the same value for both
> endian modes, "i" means something different in each case.
>
> I guess we could make a "simplification" as follows:
>
> /* Check if VAL is present in every STEP-th element, and the
> other elements are filled with its most significant bit. */
> start = BYTES_BIG_ENDIAN ? 0 : 1;
> end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits;
>
> for (i = start; i < end; ++i)
> {
> HOST_WIDE_INT desired_val;
> if (((i + (1 - start)) & (step - 1)) == 0)
> desired_val = val;
> else
> desired_val = msb_val;
>
> but that may be as hard to understand as what's there now...
>
> An alternative is to just change the loop bounds to
>
> for (i = 0; i < nunits; ++i)
>
> which will reprocess the candidate element for both endian modes, and
> leave everything else alone. That would probably actually be faster
> than all the convoluted nonsense to avoid the reprocessing. Want me to
> go in that direction instead?
>
> Thanks,
> Bill
>