Summary: | testing divisibility by constant | ||
---|---|---|---|
Product: | gcc | Reporter: | user42 |
Component: | middle-end | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | gcc-bugs, jakub |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 3.3.2 | ||
Target Milestone: | --- | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 | ||
Host: | i486-pc-linux-gnu | Target: | i486-pc-linux-gnu |
Build: | i486-pc-linux-gnu | Known to work: | |
Known to fail: | Last reconfirmed: | 2005-12-24 19:46:25 | |
Bug Depends on: | 82853 | ||
Bug Blocks: |
Description
user42
2003-10-30 22:06:22 UTC
If anybody wants to look into this, the material at http://www.hackersdelight.org/ might be useful. Nice optimizations. Subject: Re: testing divisibility by constant
I wrote:
>
> if (n % 3 == 0)
Actually, the asm code I included was for a divisor of 171717. Small
values get strength reduction on the multiplies, obscuring what gets
done.
A simple mathematical proof that the algorithm works is found here: http://clomont.com/efficient-divisibility-testing/ See also https://stackoverflow.com/a/49264279/1137388. Should be fixed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 Fixed in GCC9. Thanks for implementing the modular inverse algorithm in gcc. However, the implementation has an issue. In some cases, for no obvious reason, the compiler falls back to the old algorithm. For instance, bool f1(unsigned n) { return n % 10 == 5; } as expected, uses the modular inverse algorithm and translates to f1(unsigned int): imull $-858993459, %edi, %edi subl $1, %edi rorl %edi cmpl $429496729, %edi setbe %al ret whereas bool f2(unsigned n) { return n % 10 == 6; } doesn't use the modular inverse algorithm and is the same as in older versions of gcc: f2(unsigned int): movl %edi, %eax movl $3435973837, %edx imulq %rdx, %rax shrq $35, %rax leal (%rax,%rax,4), %eax addl %eax, %eax subl %eax, %edi cmpl $6, %edi sete %al ret See on godbolt: https://godbolt.org/z/u-C54I I would like make another observation. For some divisors (e.g. 7, 19, 21) the modular inverse algorithm seems to be faster than the traditional one even when the remainder r (in n % d == r) is not a compile time constant. In general this happens in cases where the "magic number" M used by the traditional algorithm to replace the division "n / d" with "n * M >> k" is such that M doesn't fit in a register and extra operations are required to overcome this problem. In other words, these are the divisors for which '"Add" indicator' in https://www.hackersdelight.org/magic.htm shows 1. I made some measurements and I hope to make my results available for your consideration soon. |