This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/68920] New: [6 Regression] Undesirable if-conversion for a rarely taken branch


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68920

            Bug ID: 68920
           Summary: [6 Regression] Undesirable if-conversion for a rarely
                    taken branch
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: afomin.mailbox at gmail dot com
                CC: izamyatin at gmail dot com, jgreenhalgh at gcc dot gnu.org,
                    ysrumyan at gmail dot com
  Target Milestone: ---
            Target: i686-*-*

Created attachment 37042
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37042&action=edit
A reproducer

Assuming the attached reproducer is compiled with -m32 -O2 -march=core-avx2,
there's a dramatic performance loss between r229821 and r229822.

        r229821         r229821
real    0m1.176s        0m2.109s
user    0m1.175s        0m2.106s
sys     0m0.000s        0m0.000s

The problem is that we apply if-conversion to write() routine on ce1 RTL pass
since r229822:

r229821                                r229822
46 possible IF blocks searched.        57 possible IF blocks searched.
4 IF blocks converted.                 6 IF blocks converted.
6 true changes made.                   9 true changes made.

Thus far, two statements are hoisted up into a half-hammock.
r229821                                  r229822
...                                      ...
8048469: lea    0x0(%edi,%eiz,1),%edi    804846d: lea    0x0(%esi),%esi
8048470: mov    -0x1c(%ebp),%eax         8048470: mov    -0x24(%ebp),%ecx 
8048473: add    %ecx,%eax                8048473: mov    -0x48(%ebp),%ebx      
 8048475: cmp    -0x20(%ebp),%eax         8048476: mov    -0x28(%ebp),%edx
8048478: jbe    8048480 <main+0x150>     8048479: add    %eax,%ecx
804847a: mov    -0x48(%ebp),%eax         804847b: cmp    %esi,%ecx
804847d: mov    -0x40(%ebp),%ecx         804847d: cmova  %ebx,%eax
8048480: mov    %edi,(%ecx)              8048480: cmovbe %ecx,%edx
8048482: mov    -0x1c(%ebp),%ecx                                              
8048485: add    %eax,%ecx
...                                      ...

The branch probability is very low here as we use if statement to wrap up a
buffer in a memory.
However, what I see in CFG dump after ce1 RTL pass is:
  * branch taken     = 61%
  * branch not taken = 39%
I've also tried to compile the reproducer using IA32 ICC with -O2 -xCORE-AVX2
options:
  * branch taken     = 5%
  * branch not taken = 95%

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]