Bug 90952 - Costs of moves are used for costs of RTL expressions
Summary: Costs of moves are used for costs of RTL expressions
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 90878
  Show dependency treegraph
 
Reported: 2019-06-20 21:18 UTC by H.J. Lu
Modified: 2020-01-27 13:54 UTC (History)
4 users (show)

See Also:
Host:
Target: i386,x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-01-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2019-06-20 21:18:35 UTC
This patch:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html

includes:

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index e943d13..8409a5f 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = {
   {4, 4, 4}, /* cost of loading integer registers
     in QImode, HImode and SImode.
     Relative to reg-reg move (2).  */
-  {6, 6, 6}, /* cost of storing integer registers */
+  {6, 6, 3}, /* cost of storing integer registers */
   2, /* cost of reg,reg fld/fst */
   {6, 6, 8}, /* cost of loading fp registers
     in SFmode, DFmode and XFmode */

It lowered the cost for SImode store and made it cheaper than SSE<->integer
register move.  It caused a regression:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878

Since the cost for SImode store is also used to compute costs of
scalar_store RTL expression in ix86_builtin_vectorization_cost, it
changed loop costs in

void
foo (long p2, long *diag, long d, long i)
{
  long k;
  k = p2 < 3 ? p2 + p2 : p2 + 3;
  while (i < k)
    diag[i++] = d;
}

As the result, the loop is unrolled 4 times with -O3 -march=skylake,
instead of 3.
Comment 1 Hongtao.liu 2019-08-26 05:12:28 UTC
In the latest gcc version(GCC10_20190820), there's no difference in unroll factor when applying this patch.

But still there's difference in Profitability threshold which changes from 5 to 4.

That means if loop count less than Profitability threshold, it won't trigger vectorization.
So if loop count is 4, runtime performance would be different otherwise they will be the same.
Comment 2 Martin Liška 2020-01-27 13:40:18 UTC
@H.J.Lu: Can we close the issue or is it still valid?
Comment 3 H.J. Lu 2020-01-27 13:54:30 UTC
Fixed by r274543.