[Bug rtl-optimization/65698] New: Non-optimal code for simple compare function for x86 32-bit target

Wed Apr 8 11:27:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698

            Bug ID: 65698
           Summary: Non-optimal code for simple compare function for x86
                    32-bit target
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ysrumyan at gmail dot com

For attached test-case in inner loop we can see the following deficiencies:
1. 2 redundant fills and one spill in comparison part of loop - I assume that
only 4 registers needs to keep the base of 'v1' and 'v2' and inexes 'i1' and
'i2', one more register is required to keep 'c1' or 's1'.
2. @ redundant lea instructions to perform multiplication on 2.
Here is optimal binaries produced by icc compiler( with deleted increment
part):
2e:    8a 04 3b                 mov    (%ebx,%edi,1),%al
  31:    3a 04 3e                 cmp    (%esi,%edi,1),%al
  34:    75 53                    jne    89 <my_cmp+0x89>
  36:    0f b7 04 5a              movzwl (%edx,%ebx,2),%eax
  3a:    0f b7 2c 72              movzwl (%edx,%esi,2),%ebp
  3e:    3b c5                    cmp    %ebp,%eax
  40:    75 47                    jne    89 <my_cmp+0x89>
  42:    8a 44 3b 01              mov    0x1(%ebx,%edi,1),%al
  46:    3a 44 3e 01              cmp    0x1(%esi,%edi,1),%al
  4a:    75 3d                    jne    89 <my_cmp+0x89>
  4c:    0f b7 44 5a 02           movzwl 0x2(%edx,%ebx,2),%eax
  51:    0f b7 6c 72 02           movzwl 0x2(%edx,%esi,2),%ebp
  56:    3b c5                    cmp    %ebp,%eax
  58:    75 2f                    jne    89 <my_cmp+0x89>
  5a:    83 c3 02                 add    $0x2,%ebx
...
  7b:    7f b1                    jg     2e <my_cmp+0x2e>
  Note aalso that if we commented out 2 lines
      if (i1 > n) i1 -= n;
      if (i2 > n) i2 -= n;
 we get optimal code with gcc compiler.