PING^3 [PATCH v2] rs6000: Add load density heuristic
Kewen.Lin
linkw@linux.ibm.com
Thu Jul 15 01:59:06 GMT 2021
Hi,
Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
BR,
Kewen
on 2021/6/28 下午3:01, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
>
> BR,
> Kewen
>
> on 2021/6/9 上午10:26, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> Gentle ping this:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html
>>
>> BR,
>> Kewen
>>
>> on 2021/5/26 上午10:59, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> This is the updated version of patch to deal with the bwaves_r
>>> degradation due to vector construction fed by strided loads.
>>>
>>> As Richi's comments [1], this follows the similar idea to over
>>> price the vector construction fed by VMAT_ELEMENTWISE or
>>> VMAT_STRIDED_SLP. Instead of adding the extra cost on vector
>>> construction costing immediately, it firstly records how many
>>> loads and vectorized statements in the given loop, later in
>>> rs6000_density_test (called by finish_cost) it computes the
>>> load density ratio against all vectorized stmts, and check
>>> with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD
>>> and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing
>>> if both thresholds are exceeded.
>>>
>>> Note that this new load density heuristic check is based on
>>> some fields in target cost which are updated as needed when
>>> scanning each add_stmt_cost entry, it's independent of the
>>> current function rs6000_density_test which requires to scan
>>> non_vect stmts. Since it's checking the load stmts count
>>> vs. all vectorized stmts, it's kind of density, so I put
>>> it in function rs6000_density_test. With the same reason to
>>> keep it independent, I didn't put it as an else arm of the
>>> current existing density threshold check hunk or before this
>>> hunk.
>>>
>>> In the investigation of -1.04% degradation from 526.blender_r
>>> on Power8, I noticed that the extra penalized cost 320 on one
>>> single vector construction with type V16QI is much exaggerated,
>>> which makes the final body cost unreliable, so this patch adds
>>> one maximum bound for the extra penalized cost for each vector
>>> construction statement.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>>
>>> Full SPEC2017 performance evaluation on Power8/Power9 with
>>> option combinations:
>>> * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math}
>>> * {-O3, -Ofast} {,-funroll-loops}
>>>
>>> bwaves_r degradations on P8/P9 have been fixed, nothing else
>>> remarkable was observed.
>>>
>>> Is it ok for trunk?
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html
>>>
>>> BR,
>>> Kewen
>>> -----
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.c (struct rs6000_cost_data): New members
>>> nstmts, nloads and extra_ctor_cost.
>>> (rs6000_density_test): Add load density related heuristics and the
>>> checks, do extra costing on vector construction statements if need.
>>> (rs6000_init_cost): Init new members.
>>> (rs6000_update_target_cost_per_stmt): New function.
>>> (rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
>>> rs6000_update_target_cost_per_stmt and call it.
>>>
>>
More information about the Gcc-patches
mailing list