[PATCH][i386] Add some obvious missing vectorizer patterns for AVX

Wed May 12 13:38:00 GMT 2010

On Wed, May 12, 2010 at 3:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 12, 2010 at 1:40 AM, Richard Guenther <rguenther@suse.de> wrote:
>> On Tue, 11 May 2010, H.J. Lu wrote:
>>
>>> On Mon, May 10, 2010 at 6:02 AM, Richard Guenther <rguenther@suse.de> wrote:
>>> >
>>> > This adds patterns that do not require much thought.  I duplicated
>>> > the existing (but odd to me) superfluous vec_concats for example
>>> > in vec_unpacks_hi_v8sf (AVX would have vextract for a
>>> > highpart vec_select - but there must be a reason to do it the
>>> > odd way for SSE).
>>> >
>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>> >
>>> > Ok for trunk?
>>> >
>>> > Thanks,
>>> > Richard.
>>> >
>>> > 2010-05-10  Richard Guenther  <rguenther@suse.de>
>>> >
>>> >        * config/i386/sse.md (reduc_splus_v8sf): Add.
>>> >        (reduc_splus_v4df): Likewise.
>>> >        (vec_unpacks_hi_v8sf): Likewise.
>>> >        (vec_unpacks_lo_v8sf): Likewise.
>>> >        (*avx_cvtps2pd256_2): Likewise.
>>> >        (vec_unpacks_float_hi_v8si): Likewise.
>>> >        (vec_unpacks_float_lo_v8si): Likewise.
>>> >        (vec_interleave_highv4df): Likewise.
>>> >        (vec_interleave_lowv4df): Likewise.
>>> >
>>>
>>> >
>>> > + (define_insn "vec_interleave_highv4df"
>>> > +   [(set (match_operand:V4DF 0 "register_operand" "=x")
>>> > +       (vec_select:V4DF
>>> > +         (vec_concat:V8DF
>>> > +           (match_operand:V4DF 1 "register_operand" "x")
>>> > +           (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
>>> > +         (parallel [(const_int 2) (const_int 6)
>>> > +                    (const_int 3) (const_int 7)])))]
>>> > +   "TARGET_AVX"
>>> > +   "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
>>> > +   [(set_attr "type" "sselog")
>>> > +    (set_attr "prefix" "vex")
>>> > +    (set_attr "mode" "V4DF")])
>>> > +
>>>
>>> Those patterns are incorrect. For example, there is
>>>
>>> (define_insn "avx_unpckhpd256"
>>>   [(set (match_operand:V4DF 0 "register_operand" "=x")
>>>         (vec_select:V4DF
>>>           (vec_concat:V8DF
>>>             (match_operand:V4DF 1 "register_operand" "x")
>>>             (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
>>>           (parallel [(const_int 1) (const_int 5)
>>>                      (const_int 3) (const_int 7)])))]
>>>   "TARGET_AVX"
>>>   "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
>>>   [(set_attr "type" "sselog")
>>>    (set_attr "prefix" "vex")
>>>    (set_attr "mode" "V4DF")])
>>>
>>> We can't have the same instructions with different elements.
>>
>> Hm, right.  So there's no suitable 256bit instructions for
>> vec_interleave with v4df nor v8sf mode?
>>
>
> That is correct. We have 2 choices:
>
> 1. Extend vectorizer to efficiently support 256bit AVX.
> 2. Use define_expand. I have some patches for it. The code looks bad:
3. Do not use 256bit vectors in these cases

I guess 1. and 3. are more useful, with the patterns available the
vectorizer will make unconditional use of them (I can't assess how
bad the code generation actually is though).

Richard.

>
> (define_expand "vec_interleave_highv4df"
>  [(set (match_dup 3)
>        (vec_select:V4DF
>          (vec_concat:V8DF
>            (match_operand:V4DF 1 "register_operand" "x")
>            (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
>          (parallel [(const_int 0) (const_int 4)
>                     (const_int 2) (const_int 6)])))
>   (set (match_dup 4)
>        (vec_select:V4DF
>          (vec_concat:V8DF
>            (match_dup 1)
>            (match_dup 2))
>          (parallel [(const_int 1) (const_int 5)
>                     (const_int 3) (const_int 7)])))
>   (set (match_operand:V4DF 0 "register_operand" "")
>        (vec_concat:V4DF
>          (vec_select:V2DF
>            (match_dup 3)
>            (parallel [(const_int 2) (const_int 3)]))
>          (vec_select:V2DF
>            (match_dup 4)
>            (parallel [(const_int 2) (const_int 3)]))))]
>  "TARGET_AVX"
> {
>  operands[3] = gen_reg_rtx (V4DFmode);
>  operands[4] = gen_reg_rtx (V4DFmode);
> })
>
>
>
> --
> H.J.
>