90952 – Costs of moves are used for costs of RTL expressions

Bug 90952 - Costs of moves are used for costs of RTL expressions

Summary: Costs of moves are used for costs of RTL expressions

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	90878
	Show dependency tree / graph

Reported:	2019-06-20 21:18 UTC by H.J. Lu
Modified:	2020-01-27 13:54 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	i386,x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-01-27 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2019-06-20 21:18:35 UTC

This patch:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html

includes:

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index e943d13..8409a5f 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = {
   {4, 4, 4}, /* cost of loading integer registers
     in QImode, HImode and SImode.
     Relative to reg-reg move (2).  */
-  {6, 6, 6}, /* cost of storing integer registers */
+  {6, 6, 3}, /* cost of storing integer registers */
   2, /* cost of reg,reg fld/fst */
   {6, 6, 8}, /* cost of loading fp registers
     in SFmode, DFmode and XFmode */

It lowered the cost for SImode store and made it cheaper than SSE<->integer
register move.  It caused a regression:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878

Since the cost for SImode store is also used to compute costs of
scalar_store RTL expression in ix86_builtin_vectorization_cost, it
changed loop costs in

void
foo (long p2, long *diag, long d, long i)
{
  long k;
  k = p2 < 3 ? p2 + p2 : p2 + 3;
  while (i < k)
    diag[i++] = d;
}

As the result, the loop is unrolled 4 times with -O3 -march=skylake,
instead of 3.

Comment 1 Hongtao.liu 2019-08-26 05:12:28 UTC

In the latest gcc version(GCC10_20190820), there's no difference in unroll factor when applying this patch.

But still there's difference in Profitability threshold which changes from 5 to 4.

That means if loop count less than Profitability threshold, it won't trigger vectorization.
So if loop count is 4, runtime performance would be different otherwise they will be the same.

Comment 2 Martin Liška 2020-01-27 13:40:18 UTC

@H.J.Lu: Can we close the issue or is it still valid?

Comment 3 H.J. Lu 2020-01-27 13:54:30 UTC

Fixed by r274543.