This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat


On 8/25/18 1:09 PM, Segher Boessenkool wrote:
> Hey,
>
> On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote:
>> On 8/20/18 4:44 PM, Will Schmidt wrote:
>>> Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
>>>     
>>> This uses the tree_vec_extract() function out of tree-vect-generic.c
>>> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
>>> made non-static as part of this change.
>>>     
>>> Testcases are already in-tree.
>>>     
>>> V2 updates, per feedback previously received.
>>> Forced arg1 into range (modulo #elements) before attempting to extract
>>> the splat value.
>>> Removed the (now unnecessary) code that did bounds-checking before calling
>>> the tree_vec_extract helper.
>>> Used arg0_type rather than lhs_type for calculating the tree size.
>>>     
>>> V3 updates, inspired by additional offline discussion with Segher.
>>> Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
>>> number less than the number of elements supported by the respective ARG1
>>> type, so we do not attempt to gimple-fold the intrinsic if we determine our
>>> value is out of range. Thus, the initial check ensures that ARG1 is both
>>> constant and has a valid index into ARG0.
>>> The subsequent modulo operation is no longer necessary, and has been removed.
>>> 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
>>>     
>>>     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>>>     	  early gimple folding of vec_splat().
> s/  //
>
>>>     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>>>     	* gimple-fold.h:  Add an extern define for tree_vec_extract().
> s/  / /
>
>>> +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
>>> +	 index into the arg0 vector.  */
> Needs two more spaces indent?
>
>>> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>>> +	if (TREE_CODE (arg1) != INTEGER_CST
>>> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
>> Space between - and 1.
>>
>> I'm still not convinced we should do this.  The semantics of vec_splat are
>> to select the element as arg1 modulo n_elts.  Odd as it seems, it's historically
>> valid for values between 0 and 31 to be used regardless of the value of n_elts.
> But, the assembler complains.  So it never worked, and it seems better to
> not allow it in the compiler either.
>
> A lot of existing testcases generate assembler code the assembler chokes
> on -- but the testcases are dg-do compile, so the assembler isn't run.
>
> This is also PR86987; I have patches, but it snowballs because of the
> aforementioned existing wrong testcases.
>
>> Since arg1 is a constant we can easily bring it into valid range and expand it
>> early, rather than losing optimization opportunities by keeping this as a
>> built-in call.
> We should just error on invalid input.
>
>> Do we have test cases for arg1 in {n_elts, ..., 31}?  What's GCC's current
>> behavior?  Maybe we already aren't honoring this technically legal case?
> We do, lots.  Sometimes we compile it; the assembler chokes on the output
> (but the assembler is not run in those testcases).  Sometimes we ICE.

Thanks for the reminders. :-)  No concerns, then.

Bill

>
>> Otherwise looks fine to me, though of course I can't approve.
> Looks fine to me, too.
>
>>> +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
>>> +				    * BITS_PER_UNIT / n_elts;
> (Well one thing then...  Wrong indent?)
>
>
> Segher
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]