V3 [PATCH] Optimize vector constructor

H.J. Lu hjl.tools@gmail.com
Wed Mar 6 07:54:00 GMT 2019


On Tue, Mar 5, 2019 at 1:46 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 04, 2019 at 12:55:04PM +0100, Richard Biener wrote:
> > On Sun, Mar 3, 2019 at 10:13 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sun, Mar 03, 2019 at 06:40:09AM -0800, Andrew Pinski wrote:
> > > > )
> > > > ,On Sun, Mar 3, 2019 at 6:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > For vector init constructor:
> > > > >
> > > > > ---
> > > > > typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> > > > >
> > > > > __v4sf
> > > > > foo (__v4sf x, float f)
> > > > > {
> > > > >   __v4sf y = { f, x[1], x[2], x[3] };
> > > > >   return y;
> > > > > }
> > > > > ---
> > > > >
> > > > > we can optimize vector init constructor with vector copy or permute
> > > > > followed by a single scalar insert:
>
> > and you want to advance to the _1 = BIT_INSERT_EXPR here.  The easiest way
> > is to emit a new stmt for _2 = copy ...; and do the set_rhs with the
> > BIT_INSERT_EXPR.
>
> Thanks for BIT_INSERT_EXPR suggestion.  I am testing this patch.
>
>
> H.J.
> ---
> We can optimize vector constructor with vector copy or permute followed
> by a single scalar insert:
>
>   __v4sf y;
>   __v4sf D.1930;
>   float _1;
>   float _2;
>   float _3;
>
>   <bb 2> :
>   _1 = BIT_FIELD_REF <x_9(D), 32, 96>;
>   _2 = BIT_FIELD_REF <x_9(D), 32, 64>;
>   _3 = BIT_FIELD_REF <x_9(D), 32, 32>;
>   y_6 = {f_5(D), _3, _2, _1};
>   return y_6;
>
> with
>
>  __v4sf y;
>   __v4sf D.1930;
>   float _1;
>   float _2;
>   float _3;
>   vector(4) float _8;
>
>   <bb 2> :
>   _1 = BIT_FIELD_REF <x_9(D), 32, 96>;
>   _2 = BIT_FIELD_REF <x_9(D), 32, 64>;
>   _3 = BIT_FIELD_REF <x_9(D), 32, 32>;
>   _8 = x_9(D);
>   y_6 = BIT_INSERT_EXPR <x_9(D), f_5(D), 0 (32 bits)>;
>   return y_6;
>
> gcc/
>
>         PR tree-optimization/88828
>         * tree-ssa-forwprop.c (simplify_vector_constructor): Optimize
>         vector init constructor with vector copy or permute followed
>         by a single scalar insert.
>
> gcc/testsuite/
>
>         PR tree-optimization/88828
>         * gcc.target/i386/pr88828-1a.c: New test.
>         * gcc.target/i386/pr88828-2b.c: Likewise.
>         * gcc.target/i386/pr88828-2.c: Likewise.
>         * gcc.target/i386/pr88828-3a.c: Likewise.
>         * gcc.target/i386/pr88828-3b.c: Likewise.
>         * gcc.target/i386/pr88828-3c.c: Likewise.
>         * gcc.target/i386/pr88828-3d.c: Likewise.
>         * gcc.target/i386/pr88828-4a.c: Likewise.
>         * gcc.target/i386/pr88828-4b.c: Likewise.
>         * gcc.target/i386/pr88828-5a.c: Likewise.
>         * gcc.target/i386/pr88828-5b.c: Likewise.
>         * gcc.target/i386/pr88828-6a.c: Likewise.
>         * gcc.target/i386/pr88828-6b.c: Likewise.

Here is the updated patch with run-time tests.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Optimize-vector-constructor.patch
Type: text/x-patch
Size: 23948 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190306/0f220097/attachment.bin>


More information about the Gcc-patches mailing list