This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/68920] New: [6 Regression] Undesirable if-conversion for a rarely taken branch
- From: "afomin.mailbox at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 15 Dec 2015 16:00:44 +0000
- Subject: [Bug rtl-optimization/68920] New: [6 Regression] Undesirable if-conversion for a rarely taken branch
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68920
Bug ID: 68920
Summary: [6 Regression] Undesirable if-conversion for a rarely
taken branch
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: afomin.mailbox at gmail dot com
CC: izamyatin at gmail dot com, jgreenhalgh at gcc dot gnu.org,
ysrumyan at gmail dot com
Target Milestone: ---
Target: i686-*-*
Created attachment 37042
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37042&action=edit
A reproducer
Assuming the attached reproducer is compiled with -m32 -O2 -march=core-avx2,
there's a dramatic performance loss between r229821 and r229822.
r229821 r229821
real 0m1.176s 0m2.109s
user 0m1.175s 0m2.106s
sys 0m0.000s 0m0.000s
The problem is that we apply if-conversion to write() routine on ce1 RTL pass
since r229822:
r229821 r229822
46 possible IF blocks searched. 57 possible IF blocks searched.
4 IF blocks converted. 6 IF blocks converted.
6 true changes made. 9 true changes made.
Thus far, two statements are hoisted up into a half-hammock.
r229821 r229822
... ...
8048469: lea 0x0(%edi,%eiz,1),%edi 804846d: lea 0x0(%esi),%esi
8048470: mov -0x1c(%ebp),%eax 8048470: mov -0x24(%ebp),%ecx
8048473: add %ecx,%eax 8048473: mov -0x48(%ebp),%ebx
8048475: cmp -0x20(%ebp),%eax 8048476: mov -0x28(%ebp),%edx
8048478: jbe 8048480 <main+0x150> 8048479: add %eax,%ecx
804847a: mov -0x48(%ebp),%eax 804847b: cmp %esi,%ecx
804847d: mov -0x40(%ebp),%ecx 804847d: cmova %ebx,%eax
8048480: mov %edi,(%ecx) 8048480: cmovbe %ecx,%edx
8048482: mov -0x1c(%ebp),%ecx
8048485: add %eax,%ecx
... ...
The branch probability is very low here as we use if statement to wrap up a
buffer in a memory.
However, what I see in CFG dump after ce1 RTL pass is:
* branch taken = 61%
* branch not taken = 39%
I've also tried to compile the reproducer using IA32 ICC with -O2 -xCORE-AVX2
options:
* branch taken = 5%
* branch not taken = 95%