This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/65492] Bad optimization in -O3 due to if-conversion and/or unrolling


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492

--- Comment #10 from Allan Jensen <linux at carewolf dot com> ---
Just make things more complicated, I just tried the test on a Haswell, and
surprisingly disabling if-convert or tree-vectorize on -O3 has no effect on
performance, but activating tree-vectorize on -O2 does.

In conclusion. This test is slower in -O3 than -O2 on all tested CPUs Phenom,
SandyBridge and Haswell, but for different reasons.

On Phenom, it is slower due to if-convert, but not unroll (unrolled might even
be slightly faster, but only by a small amount).
On SandyBridge, it slower due to both if-convert and unroll, and even slower
when both are active.
On Haswell, it is slower due to both if-convert and unroll, but if-convert on
top of unroll is no slower than unroll on its own.

In general it is probably safe to try to avoid or undo the if-convert. There
appears to be special if-conversions only performed when vectorization is
active. Presumably they are only used in that case because they are known to
likely be slower when the loop is not vectorized. In this case the
if-conversion is done, but the loop not vectorized in the end, just slowing it
down (on non Haswell).

The unroll issue could perhaps be handled by controlling some optimization
params with tuning profiles. Where is trivial unrolling like this even
performed?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]