This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] Avoid vectorizing versioned copy loops with vectorization factor 2
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: Segher Boessenkool <segher at kernel dot crashing dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>
- Date: Sat, 13 May 2017 16:44:40 -0500
- Subject: Re: [PATCH, rs6000] Avoid vectorizing versioned copy loops with vectorization factor 2
- Authentication-results: sourceware.org; auth=none
- References: <f4ee0d29-6cdd-6b4f-167a-3fec1b38358f@linux.vnet.ibm.com> <20170505153027.GS19687@gate.crashing.org>
Thanks! This was committed to trunk last week as r247671. As we discussed
offline, I've also backported to GCC 7 (r248010) and GCC 6 (r248011).
Bill
> On May 5, 2017, at 10:30 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> Hi Bill,
>
> On Wed, May 03, 2017 at 02:43:09PM -0500, Bill Schmidt wrote:
>> We recently became aware of some poor code generation as a result of
>> unprofitable (for POWER) loop vectorization. When a loop is simply copying
>> data with 64-bit loads and stores, vectorizing with 128-bit loads and stores
>> generally does not provide any benefit on modern POWER processors.
>> Furthermore, if there is a requirement to version the loop for aliasing,
>> alignment, etc., the cost of the versioning test is almost certainly a
>> performance loss for such loops. The user code example included such a copy
>> loop, executed only a few times on average, within an outer loop that was
>> executed many times on average, causing a tremendous slowdown.
>>
>> This patch very specifically targets these kinds of loops and no others,
>> and artificially inflates the vectorization cost to ensure vectorization
>> does not appear profitable. This is done within the target model cost
>> hooks to avoid affecting other targets. A new test case is included that
>> demonstrates the refusal to vectorize.
>>
>> We've done SPEC performance testing to verify that the patch does not
>> degrade such workloads. Results were all in the noise range. The
>> customer code performance loss was verified to have been reversed.
>>
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
>> Is this ok for trunk?
>
>> 2017-05-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>>
>> * config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var.
>> (rs6000_init_cost): Initialize rs6000_vect_nonmem.
>> (rs6000_add_stmt_cost): Update rs6000_vect_nonmem.
>> (rs6000_finish_cost): Avoid vectorizing simple copy loops with
>> VF=2 that require versioning.
>>
>> [gcc/testsuite]
>>
>> 2017-05-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>>
>> * gcc.target/powerpc/veresioned-copy-loop.c: New file.
>
>> --- gcc/config/rs6000/rs6000.c (revision 247560)
>> +++ gcc/config/rs6000/rs6000.c (working copy)
>> @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data)
>>
>> /* Implement targetm.vectorize.init_cost. */
>>
>> +static bool rs6000_vect_nonmem;
>
> Please put a comment on this, saying what it is for.
>
>> + /* Check whether we're doing something other than just a copy loop.
>> + Not all such loops may be profitably vectorized; see
>> + rs6000_finish_cost. */
>> + if ((where == vect_body
>> + && (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm
>> + || kind == vec_promote_demote || kind == vec_construct
>> + || kind == scalar_to_vec))
>> + || (where != vect_body
>> + && (kind == vec_to_scalar || kind == vec_perm
>> + || kind == vec_promote_demote || kind == vec_construct
>> + || kind == scalar_to_vec)))
>> + rs6000_vect_nonmem = true;
>
> Perhaps
>
> + if ((kind == vec_to_scalar || kind == vec_perm
> + || kind == vec_promote_demote || kind == vec_construct
> + || kind == scalar_to_vec)
> + || (where == vect_body && kind == vector_stmt))
>> + rs6000_vect_nonmem = true;
>
> if you agree that is clearer.
>
> Okay for trunk with the comment added, and the condition either or not
> simplified. Thanks,
>
>
> Segher
>