Why vectorization didn't turn on by -O2
Richard Sandiford
richard.sandiford@arm.com
Wed Aug 4 08:22:43 GMT 2021
Hongtao Liu <crazylht@gmail.com> writes:
> On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help
> <gcc-help@gcc.gnu.org> wrote:
>>
>> Jan Hubicka <hubicka@ucw.cz> writes:
>> > Hi,
>> > here are updated scores.
>> > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
>> > compares
>> > base: mainline
>> > 1st column: mainline with very cheap vectorization at -O2 and -O3
>> > 2nd column: mainline with cheap vectorization at -O2 and -O3.
>> >
>> > The short story is:
>> >
>> > 1) -O2 generic performance
>> > kabylake (Intel):
>> > very cheap
>> > SPEC/SPEC2006/FP/total ~ 8.32%
>> > SPEC/SPEC2006/total -0.38% 4.74%
>> > SPEC/SPEC2006/INT/total -0.91% -0.14%
>> >
>> > SPEC/SPEC2017/INT/total 4.71% 7.11%
>> > SPEC/SPEC2017/total 2.22% 6.52%
>> > SPEC/SPEC2017/FP/total 0.34% 6.06%
>> > zen
>> > SPEC/SPEC2006/FP/total 0.61% 10.23%
>> > SPEC/SPEC2006/total 0.26% 6.27%
>> > SPEC/SPEC2006/INT/total 34.006 -0.24% 0.90%
>> >
>> > SPEC/SPEC2017/INT/total 3.937 5.34% 7.80%
>> > SPEC/SPEC2017/total 3.02% 6.55%
>> > SPEC/SPEC2017/FP/total 1.26% 5.60%
>> >
>> > 2) -O2 size:
>> > -0.78% (very cheap) 6.51% (cheap) for spec2k2006
>> > -0.32% (very cheap) 6.75% (cheap) for spec2k2017
>> > 3) build times:
>> > 0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006
>> > 0.39% 0.57% 0.71% (very cheap) 5.40% 6.23% 8.44% (cheap) for spec2k2017
>> > here I simply copied data from different configuratoins
>> >
>> > So for SPEC i would say that most of compile time costs are derrived
>> > from code size growth which is a problem with cheap model but not with
>> > very cheap. Very cheap indeed results in code size improvements and
>> > compile time impact is probably somewhere around 0.5%
>> >
>> > So from these scores alone this would seem that vectorization makes
>> > sense at -O2 with very cheap model to me (I am sure we have other
>> > optimizations with worse benefits to compile time tradeoffs).
>>
>> Thanks for running these.
>>
>> The biggest issue I know of for enabling very-cheap at -O2 is:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
>>
>> Perhaps we could get around that by (hopefully temporarily) disabling
>> BB SLP within loop vectorisation for the very-cheap model. This would
>> purely be a workaround and we should remove it once the PR is fixed.
>> (It would even be a compile-time win in the meantime :-))
>>
>> Thanks,
>> Richard
>>
>> > However there are usual arguments against:
>> >
>> > 1) Vectorizer being tuned for SPEC. I think the only way to overcome
>> > that argument is to enable it by default :)
>> > 2) Workloads improved are more of -Ofast type workloads
>> >
>> > Here are non-spec benchmarks we track:
>> > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
>> >
>> > I also tried to run Firefox some time ago. Results are not surprising -
>> > vectorizaiton helps rendering benchmarks which are those compiler with
>> > aggressive flags anyway.
>> >
>> > Honza
>
> Hi:
> I would like to ask if we can turn on O2 vectorization now?
I think we still need to deal with the PR100089 issue that I mentioned above.
Like I say, “dealing with” it could be as simple as disabling:
/* If we applied if-conversion then try to vectorize the
BB of innermost loops.
??? Ideally BB vectorization would learn to vectorize
control flow by applying if-conversion on-the-fly, the
following retains the if-converted loop body even when
only non-if-converted parts took part in BB vectorization. */
if (flag_tree_slp_vectorize != 0
&& loop_vectorized_call
&& ! loop->inner)
for the very-cheap vector cost model until the PR is fixed properly.
Thanks,
Richard
More information about the Gcc-help
mailing list