Why vectorization didn't turn on by -O2

Mon May 17 18:56:45 GMT 2021

Jan Hubicka <hubicka@ucw.cz> writes:
> Hi,
> here are updated scores.  
> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
> compares
>   base:  mainline
>   1st column: mainline with very cheap vectorization at -O2 and -O3
>   2nd column: mainline with cheap vectorization at -O2 and -O3.
>
> The short story is:
>
> 1) -O2 generic performance
>     kabylake (Intel):
>     				very    cheap
>         SPEC/SPEC2006/FP/total 	~ 	8.32% 	
> 	SPEC/SPEC2006/total 	-0.38% 	4.74% 	
> 	SPEC/SPEC2006/INT/total	-0.91% 	-0.14% 	
>
>    	SPEC/SPEC2017/INT/total	4.71% 	7.11% 	
> 	SPEC/SPEC2017/total 	2.22% 	6.52% 	
> 	SPEC/SPEC2017/FP/total 	0.34% 	6.06% 	
>     zen
>         SPEC/SPEC2006/FP/total 	0.61% 	10.23% 	
> 	SPEC/SPEC2006/total 	0.26% 	6.27% 	
> 	SPEC/SPEC2006/INT/total	34.006 	-0.24% 	0.90% 	
>
>         SPEC/SPEC2017/INT/total	3.937 	5.34% 	7.80% 	
> 	SPEC/SPEC2017/total 	3.02% 	6.55% 	
> 	SPEC/SPEC2017/FP/total 	1.26% 	5.60% 	
>
>  2) -O2 size:
>      -0.78% (very cheap) 6.51% (cheap) for spec2k2006 
>      -0.32% (very cheap) 6.75% (cheap) for spec2k2017 
>  3) build times:
>      0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006
>      0.39% 0.57% 0.71%       (very cheap) 5.40% 6.23% 8.44%       (cheap) for spec2k2017
>     here I simply copied data from different configuratoins
>
> So for SPEC i would say that most of compile time costs are derrived
> from code size growth which is a problem with cheap model but not with
> very cheap.  Very cheap indeed results in code size improvements and
> compile time impact is probably somewhere around 0.5%
>
> So from these scores alone this would seem that vectorization makes
> sense at -O2 with very cheap model to me (I am sure we have other
> optimizations with worse benefits to compile time tradeoffs).

Thanks for running these.

The biggest issue I know of for enabling very-cheap at -O2 is:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089

Perhaps we could get around that by (hopefully temporarily) disabling
BB SLP within loop vectorisation for the very-cheap model.  This would
purely be a workaround and we should remove it once the PR is fixed.
(It would even be a compile-time win in the meantime :-))

Thanks,
Richard

> However there are usual arguments against:
>
>   1) Vectorizer being tuned for SPEC.  I think the only way to overcome
>      that argument is to enable it by default :)
>   2) Workloads improved are more of -Ofast type workloads
>
> Here are non-spec benchmarks we track:
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
>
> I also tried to run Firefox some time ago. Results are not surprising -
> vectorizaiton helps rendering benchmarks which are those compiler with
> aggressive flags anyway.
>
> Honza