Created attachment 34280 [details] Test case The following is a fairly typical implementation of exponentiation modulo m: $ cat ipowm.c // Computes (b**e) % m unsigned int ipowm(unsigned int b, unsigned int e, unsigned int m) { unsigned int ret; b %= m; for (ret = m > 1; e; e >>= 1) { if ((e & 1) == 1) { ret = (unsigned long long)ret * b % m; } b = (unsigned long long)b * b % m; } return ret; } Unfortunately, GCC emits a 64-bit multiply and divide for both "... * b % m" expressions on x86 and x86-64, where a 32-bit multiply and divide would be equivalent and faster. $ gcc -std=c11 -O3 -Wall -S -save-temps ipowm.c $ cat ipowm.s ... imulq %rdi, %rax divq %rcx ... imulq %rdi, %rax divq %rcx ... The pattern mull %edi divl %ecx would be much faster. They're equivalent because b is always reduced mod m, so b < m and therefore (for any unsigned int x), x * b / m <= x * m / m == x, thus the quotient will always fit in 32 bits.
Not sure if your analysis is correct, but value-range propagation computes ret_4: VARYING ... _12: [0, 4294967295] _13: [0, 4294967295] _14: [0, 18446744065119617025] _15: [0, 4294967295] _16: [0, +INF] for Folding statement: _12 = (long long unsigned int) ret_4; Not folded Folding statement: _13 = (long long unsigned int) b_1; Not folded Folding statement: _14 = _12 * _13; Not folded Folding statement: _15 = (long long unsigned int) m_6(D); Not folded Folding statement: _16 = _14 % _15; so its analysis needs to be improved (didn't investigate exactly where it fails, but IIRC modulo is not handled here because m may be zero).
You would need symbolic ranges, b and ret are in [0,m-1]. And then you are using that very specific x86 instruction that divides 64 bits by 32 bits but only works if the result fits in 32 bits. It works here because (m-1)*(m-1)/m<m is small enough, but that's very hard for the compiler to prove, and I don't know of another architecture with a similar instruction. (Related: PR58897, PR53100)
@Richard Biener: Yes the range for _16 could be [0, 4294967294]. Why can't VRP can't assume division by zero doesn't occur? If it can then it could say anything mod [a, b] fits in [0, b - 1]. That's a reasonable improvement by itself but it's not enough to optimize this PR, because to use divl for (ret * b % m), you need (ret * b / m) to fit in [0, 4294967295] as well. And to know that that, as Marc Glisse suggests, you'd need symbolic ranges. @Marc Glisse: Is there currently no support at all for symbolic ranges? If you can infer that b < m is an invariant then that's all you need. Formally it's something like this: If x, y, and z are 32-bit unsigned integers, and x <= z || y <= z, then (uint64_t)x * (uint64_t)y % z can be computed with mull and divl because x * y / z is always <= max(x, y) which fits in 32 bits.
(In reply to Marc Glisse from comment #2) > You would need symbolic ranges, b and ret are in [0,m-1]. And then you are > using that very specific x86 instruction that divides 64 bits by 32 bits but > only works if the result fits in 32 bits. It works here because > (m-1)*(m-1)/m<m is small enough, but that's very hard for the compiler to > prove, and I don't know of another architecture with a similar instruction. On SH a 64/32 -> 32 bit unsigned division can be done by repeating rotcl,div1 32 times (once for each result bit). It's one of the examples for the 1-step division insn 'div1' in the manuals. Effectively it's the same as the x86 divl insn.