There's a 4% regression in 456.hmmer observed on https://gcc.opensuse.org/gcc-old/SPEC/CINT/sb-czerny-head-64-2006/recent.html with r272239 at -Ofast -march=haswell [-flto].
So I see fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note: reusing loop version created by if conversion fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note: reusing loop version created by if conversion fast_algorithms.c.158t.vect:fast_algorithms.c:133:5: note: reusing loop version created by if conversion histogram.c.158t.vect:histogram.c:702:3: note: reusing loop version created by if conversion and no outer loop versioning is even attempted. It's obviously going to be ast_algorithms.c:133 since that's the P7Viterbi function (which now is split and distributed thus it appears three times). Eventually the cold profile for the non-vector path hurts here. Have to see whether the non-vector path gets any cycles. So this may be vect_loop_versioning where it says /* ??? if-conversion uses profile_probability::always () but prob below is profile_probability::likely (). */ refering to the alternate path using loop_versioning.
So looking at perf the reason seems obvious: Samples: 717K of event 'cycles:pu', Event count (approx.): 586330968682 Overhead Command Shared Object Symbol ◆ 60.74% hmmer_base.amd6 hmmer_base.amd64-m64-gcc42-nn [.] P7Viterbi ▒ 31.35% hmmer_base.amd6 hmmer_base.amd64-m64-gcc42-nn [.] P7Viterbi.cold ▒ 2.32% hmmer_base.amd6 hmmer_base.amd64-m64-gcc42-nn [.] FChoose ▒ 2.02% hmmer_base.amd6 hmmer_base.amd64-m64-gcc42-nn [.] sre_random and caused by /* ??? if-conversion uses profile_probability::always () but prob below is profile_probability::likely (). */ thus we keep the profile from if-conversion which uses always () compared to previously versioning the loop with the vectorized path being only likely (). The inconsistent profile from if-conversion persists and seems to confuse us later: /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED is re-merged in the vectorizer. */ new_loop = loop_version (loop, cond, &cond_bb, profile_probability::always (), profile_probability::always (), profile_probability::always (), profile_probability::always (), true); Just updating the edge probability of the guard seems to avoid creating the bogus hot/cold partitioning and should not affect further copying from the scalar loop from prologue/epilogue peeling.
Author: rguenth Date: Thu Jul 4 13:55:15 2019 New Revision: 273082 URL: https://gcc.gnu.org/viewcvs?rev=273082&root=gcc&view=rev Log: 2019-07-04 Richard Biener <rguenther@suse.de> PR tree-optimization/90911 * tree-vectorizer.h (_loop_vec_info::scalar_loop_scaling): New field. (LOOP_VINFO_SCALAR_LOOP_SCALING): new. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize scalar_loop_scaling. (vect_transform_loop): Scale scalar loop profile if needed. * tree-vect-loop-manip.c (vect_loop_versioning): When re-using the loop copy from if-conversion adjust edge probabilities and scale the vectorized loop body profile, queue the scalar profile for updating after peeling. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-loop-manip.c trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vectorizer.h
Fixed.