[Bug middle-end/93634] New: Improving modular calculations (e.g. divisibility tests).
cassio.neri at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sat Feb 8 15:43:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634
Bug ID: 93634
Summary: Improving modular calculations (e.g. divisibility
tests).
Product: gcc
Version: 9.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: cassio.neri at gmail dot com
Target Milestone: ---
Consider:
bool f(unsigned n) { return n % 6 == 4; }
at -O3 the code generated for x86_64 is
mov %edi,%eax
mov $0xaaaaaaab,%edx
imul %rdx,%rax
shr $0x22,%rax
lea (%rax,%rax,2),%eax
add %eax,%eax
sub %eax,%edi
cmp $0x4,%edi
sete %al
retq
whereas it could be
sub $0x4,%edi
imul $0xaaaaaaab,%edi,%edi
ror %edi
cmp $0x2aaaaaa9,%edi
setbe %al
retq
Notice the later is quite similar to what gcc generates for n % 6 == 3:
imul $0xaaaaaaab,%edi,%edi
sub $0x1,%edi
ror %edi
cmp $0x2aaaaaaa,%edi
setbe %al
retq
It's true that there's a small mathematical difference for the cases r <= 3 and
r >= 4 but not enough to throw away the faster algorithm. I reckon this is not
obvious and I refer to
https://accu.org/var/uploads/journals/Overload154.pdf#page=13 which presents
the overall idea and some benchmarks. In addition, it makes some comments on
gcc's generated code for other cases of n % d == r. References therein provide
mathematical proofs and extra benchmarks.
FWIW:
1) This relates to bug 82853 and bug 12849 and to a lesser extend bug 89845.
2) Specifically, it confirms the idea (for unsigned integers) described by Orr
Shalom Dvory in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853#c33
More information about the Gcc-bugs
mailing list