[Bug middle-end/93634] New: Improving modular calculations (e.g. divisibility tests).

Sat Feb 8 15:43:00 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634

            Bug ID: 93634
           Summary: Improving modular calculations (e.g. divisibility
                    tests).
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider:

bool f(unsigned n) { return n % 6 == 4; }

at -O3 the code generated for x86_64 is

mov    %edi,%eax
mov    $0xaaaaaaab,%edx
imul   %rdx,%rax
shr    $0x22,%rax
lea    (%rax,%rax,2),%eax
add    %eax,%eax
sub    %eax,%edi
cmp    $0x4,%edi
sete   %al
retq   

whereas it could be

sub    $0x4,%edi
imul   $0xaaaaaaab,%edi,%edi
ror    %edi
cmp    $0x2aaaaaa9,%edi
setbe  %al
retq   

Notice the later is quite similar to what gcc generates for n % 6 == 3:

imul   $0xaaaaaaab,%edi,%edi
sub    $0x1,%edi
ror    %edi
cmp    $0x2aaaaaaa,%edi
setbe  %al
retq   

It's true that there's a small mathematical difference for the cases r <= 3 and
r >= 4 but not enough to throw away the faster algorithm. I reckon this is not
obvious and I refer to
https://accu.org/var/uploads/journals/Overload154.pdf#page=13 which presents
the overall idea and some benchmarks. In addition, it makes some comments on
gcc's generated code for other cases of n % d == r. References therein provide
mathematical proofs and extra benchmarks.

FWIW:

1) This relates to bug 82853 and bug 12849 and to a lesser extend bug 89845.

2) Specifically, it confirms the idea (for unsigned integers) described by Orr
Shalom Dvory in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853#c33