This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix ICE when generating a vector shift by scalar
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 2 Sep 2015 14:44:03 +0200
- Subject: Re: [PATCH] Fix ICE when generating a vector shift by scalar
- Authentication-results: sourceware.org; auth=none
- References: <1441052882 dot 4779 dot 3 dot camel at oc8801110288 dot ibm dot com> <CAFiYyc2LDEXmRURfet6G691AON4UPjt6KEJRZy4Szz=HKfHESg at mail dot gmail dot com> <1441122782 dot 4925 dot 6 dot camel at oc8801110288 dot ibm dot com>
On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
>> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > Hi,
>> >
>> > The following simple test fails when attempting to convert a vector
>> > shift-by-scalar into a vector shift-by-vector.
>> >
>> > typedef unsigned char v16ui __attribute__((vector_size(16)));
>> >
>> > v16ui vslb(v16ui v, unsigned char i)
>> > {
>> > return v << i;
>> > }
>> >
>> > When this code is gimplified, the shift amount gets expanded to an
>> > unsigned int:
>> >
>> > vslb (v16ui v, unsigned char i)
>> > {
>> > v16ui D.2300;
>> > unsigned int D.2301;
>> >
>> > D.2301 = (unsigned int) i;
>> > D.2300 = v << D.2301;
>> > return D.2300;
>> > }
>> >
>> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
>> > using expand_vector_broadcast, which produces the following rtx to be
>> > used to initialize a V16QI vector:
>> >
>> > (parallel:V16QI [
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > ])
>> >
>> > The back end eventually chokes trying to generate a copy of the SImode
>> > expression into a QImode memory slot.
>> >
>> > This patch fixes this problem by ensuring that the shift amount is
>> > truncated to the inner mode of the vector when necessary. I've added a
>> > test case verifying correct PowerPC code generation in this case.
>> >
>> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> > regressions. Is this ok for trunk?
>> >
>> > Thanks,
>> > Bill
>> >
>> >
>> > [gcc]
>> >
>> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>> >
>> > * optabs.c (expand_binop): Don't create a broadcast vector with a
>> > source element wider than the inner mode.
>> >
>> > [gcc/testsuite]
>> >
>> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>> >
>> > * gcc.target/powerpc/vec-shift.c: New test.
>> >
>> >
>> > Index: gcc/optabs.c
>> > ===================================================================
>> > --- gcc/optabs.c (revision 227353)
>> > +++ gcc/optabs.c (working copy)
>> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
>> >
>> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>> > {
>> > + /* The scalar may have been extended to be too wide. Truncate
>> > + it back to the proper size to fit in the broadcast vector. */
>> > + machine_mode inner_mode = GET_MODE_INNER (mode);
>> > + if (GET_MODE_BITSIZE (inner_mode)
>> > + < GET_MODE_BITSIZE (GET_MODE (op1)))
>>
>> Does that work for modeless constants? Btw, what do other targets do
>> here? Do they
>> also choke or do they cope with the wide operand?
>
> Good question. This works by serendipity more than by design. Because
> a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
> won't be generated. It would be better for me to put in an explicit
> check for CONST_INT rather than relying on this, though. I'll fix that.
>
> I am not sure what other targets do here; I can check. However, do you
> think that's relevant? I'm concerned that
>
> (parallel:V16QI [
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> (subreg/s/v:SI (reg:DI 155) 0)
> ])
>
> is a nonsensical expression and shouldn't be produced by common code, in
> my view. It seems best to make this explicitly correct. Please let me
> know if that's off-base.
No, the above indeed looks fishy though other backends vec_init_optab might
have just handle it fine.
OTOH if a conversion is required it would be nice to CSE it, thus
force the result to a register (not sure if the targets handle invalid
RTL sharing in vec_init_optab).
> Thanks,
> Bill
>
>>
>> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
>> > + GET_MODE (op1));
>> > rtx vop1 = expand_vector_broadcast (mode, op1);
>> > if (vop1)
>> > {
>> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
>> > ===================================================================
>> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0)
>> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy)
>> > @@ -0,0 +1,20 @@
>> > +/* { dg-do compile { target { powerpc*-*-* } } } */
>> > +/* { dg-require-effective-target powerpc_altivec_ok } */
>> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
>> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
>> > +/* { dg-options "-mcpu=power7 -O2" } */
>> > +
>> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned
>> > + int. We used to fail at expand time as we tried to cram an SImode item
>> > + into a QImode memory slot. This has been fixed to properly truncate the
>> > + shift amount when splatting it into a vector. */
>> > +
>> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
>> > +
>> > +v16ui vslb(v16ui v, unsigned char i)
>> > +{
>> > + return v << i;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "vspltb" } } */
>> > +/* { dg-final { scan-assembler "vslb" } } */
>> >
>> >
>> >
>>
>
>