Bug 82858 - __builtin_add_overflow() generates suboptimal code with unsigned types on x86
Summary: __builtin_add_overflow() generates suboptimal code with unsigned types on x86
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 7.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: cmov
  Show dependency treegraph
 
Reported: 2017-11-06 11:35 UTC by LIU Hao
Modified: 2023-11-17 22:23 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work: 8.0
Known to fail: 6.4.1, 7.2.1
Last reconfirmed: 2017-11-06 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description LIU Hao 2017-11-06 11:35:37 UTC
The following function:

```
unsigned saturated_add(unsigned a, unsigned b){
    unsigned c;
    if(__builtin_add_overflow(a, b, &c)){
        return -1;
    }
    return c;
}
```

, after being compiled with `gcc -O3 -Wall -Wextra -pedantic -pedantic-errors -masm=intel`, results in:

```
saturated_add(unsigned int, unsigned int):
  add edi, esi
  jc .L3
  mov eax, edi
  ret
.L3:
  or eax, -1
  ret
```

This is suboptimal, as the branch can be simply eliminated as follows:

```
saturated_add(unsigned int, unsigned int):
  add edi, esi
  sbb eax, eax  // EAX = -1 if CF is set and 0 otherwise.
  or eax, edi
  ret
```
Comment 1 Markus Trippelsdorf 2017-11-06 12:01:46 UTC
Confirmed. Trunk is fine.
Comment 2 LIU Hao 2017-11-06 12:10:55 UTC
Trunk on <https://gcc.godbolt.org/> produces the following code:

```
saturated_add(unsigned int, unsigned int):
  add edi, esi
  mov eax, -1
  cmovnc eax, edi
  ret
```

Condition moves are however, in my opinion, nothing better than branches, or they could be something worse: <http://yarchive.net/comp/linux/cmov.html>
Comment 3 Markus Trippelsdorf 2017-11-06 12:27:17 UTC
(In reply to Liu Hao from comment #2)
> Trunk on <https://gcc.godbolt.org/> produces the following code:
> 
> ```
> saturated_add(unsigned int, unsigned int):
>   add edi, esi
>   mov eax, -1
>   cmovnc eax, edi
>   ret
> ```
> 
> Condition moves are however, in my opinion, nothing better than branches, or
> they could be something worse: <http://yarchive.net/comp/linux/cmov.html>

Linus is talking about the Pentium 4. Hardware has evolved since then.
On today's CPUs CMOV is fine.
Comment 4 Marc Glisse 2017-11-06 12:33:23 UTC
    unsigned c;
    unsigned d = __builtin_add_overflow(a, b, &c)?-1:0;
    return c|d;

gives the expected asm. Ideally phiopt would recognize a saturing add pattern, but we have nothing to model it in gimple. We could turn it into the branchless BIT_IOR form though.

(the problem isn't with __builtin_add_overflow but with what comes afterwards)
Comment 5 Uroš Bizjak 2017-11-06 13:17:12 UTC
Actually, -m32 -march=i386 -mtune=generic generates expected code:

        movl    8(%esp), %eax
        addl    4(%esp), %eax
        sbbl    %edx, %edx
        orl     %edx, %eax
        ret

-march=i386 does not have cmove.
Comment 6 Andrew Pinski 2023-04-27 22:42:53 UTC
  if (_2 != 0)
    goto <bb 4>; [21.72%]
  else
    goto <bb 3>; [78.28%]

  <bb 3> [local count: 840525096]:

  <bb 4> [local count: 1073741824]:
  # _3 = PHI <4294967295(2), _1(3)>

In general that could be written as:
_t = -(type)(_2 != 0);
_3 = _1 | _t;

That is:
unsigned b(unsigned a, unsigned b){
    if(b){
        return -1;
    }
    return a;
}
unsigned b1(unsigned a, unsigned b){
    unsigned t = b ? -1 : 0;
    return a | t;
}