This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
- From: Richard Henderson <rth at redhat dot com>
- To: Kirill Yukhin <kirill dot yukhin at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Uros Bizjak <ubizjak at gmail dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 29 Oct 2013 09:41:15 -0700
- Subject: Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
- Authentication-results: sourceware.org; auth=none
- References: <20130814074404 dot GE52726 at msticlxl57 dot ims dot intel dot com> <20130822141006 dot GA3556 at msticlxl57 dot ims dot intel dot com> <20131017141513 dot GC18369 at msticlxl57 dot ims dot intel dot com> <5265B231 dot 1040609 at redhat dot com> <20131022144256 dot GB46508 at msticlxl57 dot ims dot intel dot com> <526696E2 dot 5070004 at redhat dot com> <20131028102428 dot GA8797 at msticlxl57 dot ims dot intel dot com> <526E80A8 dot 8060707 at redhat dot com> <20131028205852 dot GC45941 at msticlxl57 dot ims dot intel dot com> <526EDAF9 dot 1090302 at redhat dot com> <20131029100225 dot GA21134 at msticlxl57 dot ims dot intel dot com>
On 10/29/2013 03:02 AM, Kirill Yukhin wrote:
> Hello Richard,
> On 28 Oct 14:45, Richard Henderson wrote:
>> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
>>> Hello Richard,
>>> On 28 Oct 08:20, Richard Henderson wrote:
>>>> Why is a masked *scalar* operation useful?
>>> The reason the instructions exist is so that
>>> you can do fully fault correct predicated scalar algorithms.
>> Using VEC_MERGE isn't the proper representation for that.
>> If that's your real goal, then COND_EXEC is the only way to let
>> rtl know that faults are suppressed in the false condition.
> I believe cond_exec approach supposed to look like this:
> (define_subst "mask_scalar"
> [(set (match_operand:SUBST_V 0)
> (match_operand:SUBST_V 1)
> (match_operand:SUBST_V 2)
> (const_int 1)))]
> [(cond_exec (eq:CC
> (match_operand:<avx512fmaskmode> 3 "register_operand" "k")
> (const_int 1))
> (set (match_dup 0)
> (match_dup 1)
> (match_dup 2)
> (const_int 1))))])
> But this only will describe merge-masking in incorrect way.
> We will need to add a clobber to signal that even for false
> condition we will zero higher part of register.
> Preferable zerro-masking will be indistinguishable from merge-
> masking and will need to choose which mask mode to enable. Bad turn.
No, a cond_exec approach to scalars would use scalar modes
not vector modes with vec_merge. In that case the higher
part of the register is ignored and undefined, and the fact
that zeroing happens is irrelevant.
> IMHO, we have 3 options to implement scalar masked insns:
> 1. `vec_merge' over vec_merge (current approach).
> 1. Precise semantic description
False, as I've described above. The compiler will believe
that all exceptions, especially memory exceptions, will be
> 2. Unified approach with vector patterns
> 3. Freedom for simplifier to reduce EVEX to VEX for
> certain const masks
> 1. Too precise semantic description and as a
> consequence complicated code in md-file
> 2. `cond_exec' approach
> 1. Look useful for compiler when trying to generate
> predicated code
> 1. Not precise. Extra clobbers (?) needed: to signal
> that we're changing the register even for false
> condition in cond_exec
> 2. Unable to describe zero masking nicely
> 3. Code still complicated as for option #1
> 4. Simplifier won't work (clobber is always clobber)
> 3. Make all masked scalar insns to be unspecs
> 1. Straight-forward, not overweighted. Enough for
> intrinsics to work
> 1. Since every unspec needs a code: substs won't be
> applied directly: huge volume of similar code
> 2. Simplifier won't work
> 3. Generation of predicated code become hard
> Am I missing some options, or thatâs all we have?
> If so, what option would you prefer?
As far as builtins are concerned, all three approaches are
functional. But in order for the compiler to be able to
automatically create conditional code from normal scalar
code we'll have to use cond_exec.
Indeed, if we've done our job right, then the user-facing
inlines that we expose could be written
inline double add_mask(double r, double x, double y, int m)
if (m & 1)
r = x + y;
although honestly such inlines would be silly.