This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally



On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote:
> On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Hi,
> >
> > In little endian mode, we managed to convert a load of the V4SI vector
> > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the
> > number 7.  It turns out we only looked at the first N-1 elements of an
> > N-element vector in little endian mode, and verified the zeroth element
> > twice.  Adjusting the loop boundaries fixes the problem.
> >
> > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu.  Ok to
> > commit to trunk if no regressions?
> 
> This patch does not make sense. It biases the loop bounds based on
> BYTES_BIG_ENDIAN, but the body of the loop biases the index variable
> based on BYTES_BIG_ENDIAN in one of the two uses.  The changes seem to
> compute the same value for the first use of index "i" for both
> BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way.  It looks
> like something should be able to be simplified.
> 
> Thanks, David
> 

It seems like it, but I haven't been able to figure out anything nicer.
The thing that makes this weird is that "i" is referring to different
elements of the array depending on Big or Little Endian numbering.

Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with
step = 2.  Then for BE, the 15's are in odd-numbered elements (starting
with zero from the left), but for LE they are in even-numbered elements
(starting with zero from the right).  For both cases, we found the
candidate value in the rightmost element (nunits - 1 for BE, 0 for LE).

The first two iterations for BE process the two leftmost elements.  For
i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val =
0.  For i = 1, we find (i+1)&1 to be zero, so the desired value is val =
15, and so on.

The first two iterations for LE process the second and third elements
from the right.  For i = 1, we find i&1 to be nonzero, so the desired
value is 0.  For i = 2, we find i&2 to be zero, so the desired value is
15.

So even though the first iteration calculates the same value for both
endian modes, "i" means something different in each case.

I guess we could make a "simplification" as follows:

  /* Check if VAL is present in every STEP-th element, and the                  
     other elements are filled with its most significant bit.  */
  start = BYTES_BIG_ENDIAN ? 0 : 1;
  end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits;

  for (i = start; i < end; ++i)
    {
      HOST_WIDE_INT desired_val;
      if (((i + (1 - start)) & (step - 1)) == 0)
        desired_val = val;
      else
        desired_val = msb_val;

but that may be as hard to understand as what's there now...

An alternative is to just change the loop bounds to

  for (i = 0; i < nunits; ++i)

which will reprocess the candidate element for both endian modes, and
leave everything else alone.  That would probably actually be faster
than all the convoluted nonsense to avoid the reprocessing.  Want me to
go in that direction instead?

Thanks,
Bill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]