Why vectorization didn't turn on by -O2

Jan Hubicka hubicka@ucw.cz
Mon May 17 16:03:09 GMT 2021

here are updated scores.  
  base:  mainline
  1st column: mainline with very cheap vectorization at -O2 and -O3
  2nd column: mainline with cheap vectorization at -O2 and -O3.

The short story is:

1) -O2 generic performance
    kabylake (Intel):
    				very    cheap
        SPEC/SPEC2006/FP/total 	~ 	8.32% 	
	SPEC/SPEC2006/total 	-0.38% 	4.74% 	
	SPEC/SPEC2006/INT/total	-0.91% 	-0.14% 	

   	SPEC/SPEC2017/INT/total	4.71% 	7.11% 	
	SPEC/SPEC2017/total 	2.22% 	6.52% 	
	SPEC/SPEC2017/FP/total 	0.34% 	6.06% 	
        SPEC/SPEC2006/FP/total 	0.61% 	10.23% 	
	SPEC/SPEC2006/total 	0.26% 	6.27% 	
	SPEC/SPEC2006/INT/total	34.006 	-0.24% 	0.90% 	

        SPEC/SPEC2017/INT/total	3.937 	5.34% 	7.80% 	
	SPEC/SPEC2017/total 	3.02% 	6.55% 	
	SPEC/SPEC2017/FP/total 	1.26% 	5.60% 	

 2) -O2 size:
     -0.78% (very cheap) 6.51% (cheap) for spec2k2006 
     -0.32% (very cheap) 6.75% (cheap) for spec2k2017 
 3) build times:
     0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006
     0.39% 0.57% 0.71%       (very cheap) 5.40% 6.23% 8.44%       (cheap) for spec2k2017
    here I simply copied data from different configuratoins

So for SPEC i would say that most of compile time costs are derrived
from code size growth which is a problem with cheap model but not with
very cheap.  Very cheap indeed results in code size improvements and
compile time impact is probably somewhere around 0.5%

So from these scores alone this would seem that vectorization makes
sense at -O2 with very cheap model to me (I am sure we have other
optimizations with worse benefits to compile time tradeoffs).

However there are usual arguments against:

  1) Vectorizer being tuned for SPEC.  I think the only way to overcome
     that argument is to enable it by default :)
  2) Workloads improved are more of -Ofast type workloads

Here are non-spec benchmarks we track:

I also tried to run Firefox some time ago. Results are not surprising -
vectorizaiton helps rendering benchmarks which are those compiler with
aggressive flags anyway.


