83210 – __builtin_mul_overflow() generates suboptimal code when exactly one argument is the constant 2

Bug 83210 - __builtin_mul_overflow() generates suboptimal code when exactly one argument is the constant 2

Summary: __builtin_mul_overflow() generates suboptimal code when exactly one argument ...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	7.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Jakub Jelinek

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2017-11-29 09:39 UTC by LIU Hao
Modified:	2017-11-30 10:47 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2017-11-29 00:00:00

Attachments
gcc8-pr83210.patch (1.22 KB, patch) 2017-11-29 17:01 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description LIU Hao 2017-11-29 09:39:30 UTC

This snippet illustrates the problem:

<https://godbolt.org/g/TssrYv>

```
bool mul_and_check(unsigned *dst, unsigned src){
    return __builtin_mul_overflow(src, 2, dst);
}
```

With `g++ -O3`, this compiles to:

```
mul_and_check(unsigned int*, unsigned int):
  mov eax, esi
  mov edx, 2
  xor ecx, ecx
  mul edx
  mov esi, eax
  jo .L5
.L2:
  mov eax, ecx
  mov DWORD PTR [rdi], esi
  and eax, 1
  ret
.L5:
  mov ecx, 1
  jmp .L2
```

This is very suboptimal. GCC could have used a bitwise shift operation instead, as follows:

```
mul_and_check(unsigned int*, unsigned int):
  xor eax, eax             # EAX  = 0
  shl esi, 1               # CF   = MSB(src)
                           # src  = src * 2;
  rol eax, 1               # EAX  = CF
  mov dword ptr[rdi], esi  # *dst = src
  ret
```

Comment 1 LIU Hao 2017-11-29 09:51:46 UTC

FWIW, it can still be improved when the constant is something other than 2.

For example:

```
bool mul_8_and_check(unsigned *dst, unsigned src){
    return __builtin_mul_overflow(src, 8, dst);
}
```

can be rewritten as:

bool mul_8_and_check(unsigned *dst, unsigned src){
    unsigned res = src << 3;
    *dst = res;
    return (res >> 3) != src; // The result will have been truncated if
                              // dividing the result by 8 does not yield
                              // the original value.
}
```

Comment 2 Andrew Pinski 2017-11-29 09:56:49 UTC

(res >> 3) != src;

Why not just (src>>(sizeof (res)*8-3))!=0.

Seems shorter and might be faster.

Comment 3 Andrew Pinski 2017-11-29 09:58:19 UTC

(In reply to Andrew Pinski from comment #2)
> (res >> 3) != src;
> 
> Why not just (src>>(sizeof (res)*8-3))!=0.
> 
> Seems shorter and might be faster.

And for the original case
Src>>31!=0 Just becomes src>>31. :)

Comment 4 LIU Hao 2017-11-29 10:01:44 UTC

(In reply to Andrew Pinski from comment #2)
> (res >> 3) != src;
> 
> Why not just (src>>(sizeof (res)*8-3))!=0.
> 
> Seems shorter and might be faster.

What if the second operand is not a power of 2? `(res * 5 / 5 != src)` will always work, but bitwise shifting might not.

Comment 5 Jakub Jelinek 2017-11-29 10:50:53 UTC

(In reply to Liu Hao from comment #4)
> (In reply to Andrew Pinski from comment #2)
> > (res >> 3) != src;
> > 
> > Why not just (src>>(sizeof (res)*8-3))!=0.
> > 
> > Seems shorter and might be faster.
> 
> What if the second operand is not a power of 2? `(res * 5 / 5 != src)` will
> always work, but bitwise shifting might not.

Division is something we need to avoid.  If any of the * 5 or / 5 ends up being actual multiplication, it doesn't make sense either.  And otherwise it will just be very long.  So the only thing that IMHO makes sense is unsigned overflow multiply with constant power of two.  I can handle that.  No plans to do anything else.

Comment 6 Jakub Jelinek 2017-11-29 17:01:26 UTC

Created attachment 42745 [details]
gcc8-pr83210.patch

Untested patch.

Comment 7 Jakub Jelinek 2017-11-30 10:30:30 UTC

Author: jakub
Date: Thu Nov 30 10:29:58 2017
New Revision: 255269

URL: https://gcc.gnu.org/viewcvs?rev=255269&root=gcc&view=rev
Log:
	PR target/83210
	* internal-fn.c (expand_mul_overflow): Optimize unsigned
	multiplication by power of 2 constant into two shifts + comparison.

	* gcc.target/i386/pr83210.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr83210.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/internal-fn.c
    trunk/gcc/testsuite/ChangeLog

Comment 8 Jakub Jelinek 2017-11-30 10:47:23 UTC

Fixed.