This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns

From: Uros Bizjak <ubizjak at gmail dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org, "H.J. Lu" <hjl dot tools at gmail dot com>
Date: Sat, 17 Sep 2011 11:13:35 +0200
Subject: Re: [RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns
References: <20110916112453.GW2687@tyan-ft48-01.lab.bos.redhat.com> <20110916162052.GZ2687@tyan-ft48-01.lab.bos.redhat.com>

On Fri, Sep 16, 2011 at 6:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> Surprisingly with -mavx2 the integer loops aren't vectorized with
>> 32-byte vectors, wonder why. ?But looking at the integer umin/umax/smin/smax
>> 16-byte reductions they generate good code even without reduc_* patterns,
>> apparently using vector shifts.
>
> Seems on that testcase the integer loops weren't using 32-byte vectors
> because there were no expanders for 32-byte integer min/max.
> The following patch adds that (and also 32-byte integer condition
> vcond/u because it is related). ?With this all the integer loops
> in that testcase are nicely vectorized with 32-byte vectors with -mavx2,
> unfortunately the reductions look terrible.
>
> The problem is that AVX2 doesn't have 32-byte whole vector shift right
> (well, in theory it has it if the shift count is exactly 128 - vextractf128).
> For shift counts > 128 we could in theory handle it as two instructions,
> vextractf128 plus a 16-byte whole vector shift with count - 128, but
> reductions actually don't need the two steps, we only care about the
> bottom bits after the shifts and the upper bits can contain anything.
>
> So, either we can fix this by adding reduc_{smin,smax,umin,umax}_v{32q,16h,8s,4d}i
> patterns (at that point I guess I should just macroize them together with
> the reduc_{smin,smax,umin,umax}_v{4sf,8sf,4df}) and handle the 4 32-byte
> integer modes also in ix86_expand_reduc, or come up with some new optab
> for an operation like whole vector shift right, but which would allow
> the upper bits to be undefined and would only allow shifts by
> vector size / 2, / 4, / 8 down to element size and corresponding tree code.
> What do you prefer?

I think that the former approach is better. We don't have full-vector
shift in this case, so faking it with some very constrainted optab
would be IMO pointless.

> OT: seems the AVX2 support put the avx2_<code><mode>3 and
> *avx2_<code><mode>3 patterns (the former after this patch <code><mode>3)
> in a wrong spot, in between vec_shr_<mode> expander and sse2_lshrv1ti3
> insn which implements what the expander expands. ?Uros, would you like to
> move it elsewhere? ?Where exactly?

I'd put these after sse4_1 umaxmin patterns, just before:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; Parallel integral comparisons
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

>
> This patch has been tested on x86_64-linux and i686-linux on SandyBridge.
>
> 2011-09-16 ?Jakub Jelinek ?<jakub@redhat.com>
>
> ? ? ? ?* config/i386/i386.c (ix86_build_const_vector): Handle V8SImode
> ? ? ? ?and V4DImode.
> ? ? ? ?(ix86_build_signbit_mask): Likewise.
> ? ? ? ?(ix86_expand_int_vcond): Likewise. ?Handle V16HImode and
> ? ? ? ?V32QImode.
> ? ? ? ?(bdesc_args): Use CODE_FOR_{s,u}m{ax,in}v{32q,16h,8s}i3
> ? ? ? ?instead of CODE_FOR_avx2_{s,u}m{ax,in}v{32q,16h,8s}i3.
> ? ? ? ?* config/i386/sse.md (avx2_<code><mode>3 umaxmin expand): Rename
> ? ? ? ?to...
> ? ? ? ?(<code><mode>3) ... this.
> ? ? ? ?(avx2_<code><mode>3 smaxmin expand): Rename to...
> ? ? ? ?(<code><mode>3) ... this.
> ? ? ? ?(smax<mode>3, smin<mode>3): Macroize using smaxmin code iterator.
> ? ? ? ?(smaxv2di3, sminv2di3): Macroize using smaxmin code iterator and
> ? ? ? ?VI8_AVX2 mode iterator.
> ? ? ? ?(umaxv2di3, uminv2di3): Macroize using umaxmin code iterator and
> ? ? ? ?VI8_AVX2 mode iterator.
> ? ? ? ?(vcond<V_256:mode><VI_256:mode>, vcondu<V_256:mode><VI_256:mode>):
> ? ? ? ?New expanders.

This is OK for mainline SVN.

Thanks,
Uros.

References:
- [RFC PATCH] Improve V8SFmode and V4DFmode smin/smax reductions
  - From: Jakub Jelinek
- [RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]