This snippet illustrates the problem: <https://godbolt.org/g/TssrYv> ``` bool mul_and_check(unsigned *dst, unsigned src){ return __builtin_mul_overflow(src, 2, dst); } ``` With `g++ -O3`, this compiles to: ``` mul_and_check(unsigned int*, unsigned int): mov eax, esi mov edx, 2 xor ecx, ecx mul edx mov esi, eax jo .L5 .L2: mov eax, ecx mov DWORD PTR [rdi], esi and eax, 1 ret .L5: mov ecx, 1 jmp .L2 ``` This is very suboptimal. GCC could have used a bitwise shift operation instead, as follows: ``` mul_and_check(unsigned int*, unsigned int): xor eax, eax # EAX = 0 shl esi, 1 # CF = MSB(src) # src = src * 2; rol eax, 1 # EAX = CF mov dword ptr[rdi], esi # *dst = src ret ```
FWIW, it can still be improved when the constant is something other than 2. For example: ``` bool mul_8_and_check(unsigned *dst, unsigned src){ return __builtin_mul_overflow(src, 8, dst); } ``` can be rewritten as: bool mul_8_and_check(unsigned *dst, unsigned src){ unsigned res = src << 3; *dst = res; return (res >> 3) != src; // The result will have been truncated if // dividing the result by 8 does not yield // the original value. } ```
(res >> 3) != src; Why not just (src>>(sizeof (res)*8-3))!=0. Seems shorter and might be faster.
(In reply to Andrew Pinski from comment #2) > (res >> 3) != src; > > Why not just (src>>(sizeof (res)*8-3))!=0. > > Seems shorter and might be faster. And for the original case Src>>31!=0 Just becomes src>>31. :)
(In reply to Andrew Pinski from comment #2) > (res >> 3) != src; > > Why not just (src>>(sizeof (res)*8-3))!=0. > > Seems shorter and might be faster. What if the second operand is not a power of 2? `(res * 5 / 5 != src)` will always work, but bitwise shifting might not.
(In reply to Liu Hao from comment #4) > (In reply to Andrew Pinski from comment #2) > > (res >> 3) != src; > > > > Why not just (src>>(sizeof (res)*8-3))!=0. > > > > Seems shorter and might be faster. > > What if the second operand is not a power of 2? `(res * 5 / 5 != src)` will > always work, but bitwise shifting might not. Division is something we need to avoid. If any of the * 5 or / 5 ends up being actual multiplication, it doesn't make sense either. And otherwise it will just be very long. So the only thing that IMHO makes sense is unsigned overflow multiply with constant power of two. I can handle that. No plans to do anything else.
Created attachment 42745 [details] gcc8-pr83210.patch Untested patch.
Author: jakub Date: Thu Nov 30 10:29:58 2017 New Revision: 255269 URL: https://gcc.gnu.org/viewcvs?rev=255269&root=gcc&view=rev Log: PR target/83210 * internal-fn.c (expand_mul_overflow): Optimize unsigned multiplication by power of 2 constant into two shifts + comparison. * gcc.target/i386/pr83210.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr83210.c Modified: trunk/gcc/ChangeLog trunk/gcc/internal-fn.c trunk/gcc/testsuite/ChangeLog
Fixed.