93564 – [10 Regression] 470.lbm regresses by 25% on znver2 with -Ofast -march=native LTO and PGO since r10-6384-g2a07345c4f8dabc2

Bug 93564 - [10 Regression] 470.lbm regresses by 25% on znver2 with -Ofast -march=native LTO and PGO since r10-6384-g2a07345c4f8dabc2

Summary: [10 Regression] 470.lbm regresses by 25% on znver2 with -Ofast -march=native ...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	10.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ra

Depends on:
Blocks:	spec
	Show dependency tree / graph

Reported:	2020-02-04 12:29 UTC by Martin Liška
Modified:	2024-03-25 16:10 UTC (History)
CC List:	2 users (show)

See Also:	spec 114468
Host:
Target:
Build:
Known to work:	9.2.0
Known to fail:	10.0
Last reconfirmed:	2020-02-04 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Liška 2020-02-04 12:29:33 UTC

Since the revision I see quite big slow down:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=288.240.0

and I see a smaller slow down on different configurations:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=10.477.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=297.477.0

Comment 1 Vladimir Makarov 2020-02-05 15:09:19 UTC

Thank your for reporting this.

I've changed RA heuristics.  It is very rare case when you change heuristics and there is no one SPEC benchmark with performance degradation.  Usually some benchmarks improve, some ones worsen. We should look at the overall rate.

According to the overall CPU2017 rate I see 6.514 before my patch and 6.503 after.  It is 0,17% change which is in error measure range.  I've got analogous data on SPEC2000 when I tried my patch (actually the rate with the patch was better).

For CPU2006 I see 50.337 before the patch and 50.107 after which is close to 0.4% degradation.  This is pretty high.  I will investigate what can I do to improve the heuristics but it will take some time.  If nothing works probably we should revert my patch and reopen 2 PRs which it closed.

Comment 2 GCC Commits 2020-02-23 21:21:52 UTC

The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d

commit r10-6804-g3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Sun Feb 23 16:20:05 2020 -0500

    Changing cost propagation and ordering colorable bucket heuristics for PR93564.
    
    2020-02-23  Vladimir Makarov  <vmakarov@redhat.com>
    
    	PR rtl-optimization/93564
    	* ira-color.c (struct update_cost_queue_elem): New member start.
    	(queue_update_cost, get_next_update_cost): Add new arg start.
    	(allocnos_conflict_p): New function.
    	(update_costs_from_allocno): Add new arg conflict_cost_update_p.
    	Add checking conflicts with allocnos_conflict_p.
    	(update_costs_from_prefs, restore_costs_from_copies): Adjust
    	update_costs_from_allocno calls.
    	(update_conflict_hard_regno_costs): Add checking conflicts with
    	allocnos_conflict_p.  Adjust calls of queue_update_cost and
    	get_next_update_cost.
    	(assign_hard_reg): Adjust calls of queue_update_cost.  Add
    	debugging print.
    	(bucket_allocno_compare_func): Restore previous version.

Comment 3 Vladimir Makarov 2020-02-27 16:36:16 UTC

  I checked the new results

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=288.240.0

It seems the patch solved the problem.

Comment 4 Jeffrey A. Law 2020-02-27 19:08:30 UTC

Per c#3.

Comment 5 GCC Commits 2020-02-28 16:29:13 UTC

The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:f3ce088645e5305d932380c7520809181b2d2eb9

commit r10-6919-gf3ce088645e5305d932380c7520809181b2d2eb9
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Fri Feb 28 11:27:30 2020 -0500

    One more patch for PR93564: Prefer smaller hard regno when we do not honor reg alloc order.
    
    2020-02-28  Vladimir Makarov  <vmakarov@redhat.com>
    
    	PR rtl-optimization/93564
    	* ira-color.c (assign_hard_reg): Prefer smaller hard regno when we
    	do not honor reg alloc order.

Comment 6 Martin Liška 2020-03-03 08:24:06 UTC

> and I see a smaller slow down on different configurations:
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=10.477.0
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=297.477.0

Thank you Vladimir looking into it. For these 2 regression, I still see it regressing (one even worse after the revision). Morever, there's one another lbm regression on Zen:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=259.477.0&plot.1=29.477.0&

To be honest, we slightly analyzer lbm performance on Zen some time ago and we identified regression which depends on code layout (density of SSE instructions that probably can't fit in a u-op cache, or something similar).

Comment 7 Martin Liška 2020-03-11 12:06:18 UTC

commit r10-7093-g5dc1390b41db5c1765e25fd21dad1a930a015aac
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Mon Mar 9 14:05:09 2020 -0400

    Revert: One more patch for PR93564: Prefer smaller hard regno when we do not honor reg alloc order.
    
    2020-03-09  Vladimir Makarov  <vmakarov@redhat.com>
    
            Revert:
    
            2020-02-28  Vladimir Makarov  <vmakarov@redhat.com>
    
            PR rtl-optimization/93564
            * ira-color.c (assign_hard_reg): Prefer smaller hard regno when we
            do not honor reg alloc order.