Bug 90911 - [10 Regression] 456.hmmer regression with r272239
Summary: [10 Regression] 456.hmmer regression with r272239
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 10.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2019-06-18 14:06 UTC by Richard Biener
Modified: 2019-07-04 13:56 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2019-06-18 14:06:08 UTC
There's a 4% regression in 456.hmmer observed on https://gcc.opensuse.org/gcc-old/SPEC/CINT/sb-czerny-head-64-2006/recent.html with r272239 at -Ofast -march=haswell [-flto].
Comment 1 Richard Biener 2019-06-18 14:46:09 UTC
So I see

fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note:  reusing loop version created by if conversion
fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note:  reusing loop version created by if conversion
fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note:  reusing loop version created by if conversion
histogram.c.158t.vect:histogram.c:702:3: note:  reusing loop version created by if conversion

and no outer loop versioning is even attempted.

It's obviously going to be ast_algorithms.c:133 since that's the P7Viterbi
function (which now is split and distributed thus it appears three times).

Eventually the cold profile for the non-vector path hurts here.  Have to
see whether the non-vector path gets any cycles.  So this may be
vect_loop_versioning where it says

      /* ???  if-conversion uses profile_probability::always () but
         prob below is profile_probability::likely ().  */

refering to the alternate path using loop_versioning.
Comment 2 Richard Biener 2019-06-25 10:56:31 UTC
So looking at perf the reason seems obvious:

Samples: 717K of event 'cycles:pu', Event count (approx.): 586330968682         
Overhead  Command          Shared Object                  Symbol               ◆
  60.74%  hmmer_base.amd6  hmmer_base.amd64-m64-gcc42-nn  [.] P7Viterbi        ▒
  31.35%  hmmer_base.amd6  hmmer_base.amd64-m64-gcc42-nn  [.] P7Viterbi.cold   ▒
   2.32%  hmmer_base.amd6  hmmer_base.amd64-m64-gcc42-nn  [.] FChoose          ▒
   2.02%  hmmer_base.amd6  hmmer_base.amd64-m64-gcc42-nn  [.] sre_random       

and caused by

      /* ???  if-conversion uses profile_probability::always () but
         prob below is profile_probability::likely ().  */

thus we keep the profile from if-conversion which uses always () compared
to previously versioning the loop with the vectorized path being only
likely ().  The inconsistent profile from if-conversion persists and seems
to confuse us later:

  /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
     is re-merged in the vectorizer.  */
  new_loop = loop_version (loop, cond, &cond_bb,
                           profile_probability::always (),
                           profile_probability::always (),
                           profile_probability::always (),
                           profile_probability::always (), true);

Just updating the edge probability of the guard seems to avoid creating
the bogus hot/cold partitioning and should not affect further copying
from the scalar loop from prologue/epilogue peeling.
Comment 3 Richard Biener 2019-07-04 13:55:46 UTC
Author: rguenth
Date: Thu Jul  4 13:55:15 2019
New Revision: 273082

URL: https://gcc.gnu.org/viewcvs?rev=273082&root=gcc&view=rev
Log:
2019-07-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/90911
	* tree-vectorizer.h (_loop_vec_info::scalar_loop_scaling): New field.
	(LOOP_VINFO_SCALAR_LOOP_SCALING): new.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
	scalar_loop_scaling.
	(vect_transform_loop): Scale scalar loop profile if needed.
	* tree-vect-loop-manip.c (vect_loop_versioning): When re-using
	the loop copy from if-conversion adjust edge probabilities
	and scale the vectorized loop body profile, queue the scalar
	profile for updating after peeling.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-vect-loop-manip.c
    trunk/gcc/tree-vect-loop.c
    trunk/gcc/tree-vectorizer.h
Comment 4 Richard Biener 2019-07-04 13:56:28 UTC
Fixed.