[PATCH][i386] Add some obvious missing vectorizer patterns for AVX

Wed May 12 14:37:00 GMT 2010

On Wed, May 12, 2010 at 6:37 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, May 12, 2010 at 3:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, May 12, 2010 at 1:40 AM, Richard Guenther <rguenther@suse.de> wrote:
>>> On Tue, 11 May 2010, H.J. Lu wrote:
>>>
>>>> On Mon, May 10, 2010 at 6:02 AM, Richard Guenther <rguenther@suse.de> wrote:
>>>> >
>>>> > This adds patterns that do not require much thought.  I duplicated
>>>> > the existing (but odd to me) superfluous vec_concats for example
>>>> > in vec_unpacks_hi_v8sf (AVX would have vextract for a
>>>> > highpart vec_select - but there must be a reason to do it the
>>>> > odd way for SSE).
>>>> >
>>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>>> >
>>>> > Ok for trunk?
>>>> >
>>>> > Thanks,
>>>> > Richard.
>>>> >
>>>> > 2010-05-10  Richard Guenther  <rguenther@suse.de>
>>>> >
>>>> >        * config/i386/sse.md (reduc_splus_v8sf): Add.
>>>> >        (reduc_splus_v4df): Likewise.
>>>> >        (vec_unpacks_hi_v8sf): Likewise.
>>>> >        (vec_unpacks_lo_v8sf): Likewise.
>>>> >        (*avx_cvtps2pd256_2): Likewise.
>>>> >        (vec_unpacks_float_hi_v8si): Likewise.
>>>> >        (vec_unpacks_float_lo_v8si): Likewise.
>>>> >        (vec_interleave_highv4df): Likewise.
>>>> >        (vec_interleave_lowv4df): Likewise.
>>>> >
>>>>
>>>> >
>>>> > + (define_insn "vec_interleave_highv4df"
>>>> > +   [(set (match_operand:V4DF 0 "register_operand" "=x")
>>>> > +       (vec_select:V4DF
>>>> > +         (vec_concat:V8DF
>>>> > +           (match_operand:V4DF 1 "register_operand" "x")
>>>> > +           (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
>>>> > +         (parallel [(const_int 2) (const_int 6)
>>>> > +                    (const_int 3) (const_int 7)])))]
>>>> > +   "TARGET_AVX"
>>>> > +   "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
>>>> > +   [(set_attr "type" "sselog")
>>>> > +    (set_attr "prefix" "vex")
>>>> > +    (set_attr "mode" "V4DF")])
>>>> > +
>>>>
>>>> Those patterns are incorrect. For example, there is
>>>>
>>>> (define_insn "avx_unpckhpd256"
>>>>   [(set (match_operand:V4DF 0 "register_operand" "=x")
>>>>         (vec_select:V4DF
>>>>           (vec_concat:V8DF
>>>>             (match_operand:V4DF 1 "register_operand" "x")
>>>>             (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
>>>>           (parallel [(const_int 1) (const_int 5)
>>>>                      (const_int 3) (const_int 7)])))]
>>>>   "TARGET_AVX"
>>>>   "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
>>>>   [(set_attr "type" "sselog")
>>>>    (set_attr "prefix" "vex")
>>>>    (set_attr "mode" "V4DF")])
>>>>
>>>> We can't have the same instructions with different elements.
>>>
>>> Hm, right.  So there's no suitable 256bit instructions for
>>> vec_interleave with v4df nor v8sf mode?
>>>
>>
>> That is correct. We have 2 choices:
>>
>> 1. Extend vectorizer to efficiently support 256bit AVX.
>> 2. Use define_expand. I have some patches for it. The code looks bad:
> 3. Do not use 256bit vectors in these cases
>
> I guess 1. and 3. are more useful, with the patterns available the
> vectorizer will make unconditional use of them (I can't assess how
> bad the code generation actually is though).
>

How hard to choose a vector size based on available
patterns? In loop, there may be supported patterns
and unsupported patterns for 256bit vectors. Where
do we draw the line?

I think we should start with 2 and move to 1.

-- 
H.J.