Bug 98977 - Failure to optimize consecutive sub flags usage
Summary: Failure to optimize consecutive sub flags usage
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 11.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 3507
Blocks:
  Show dependency treegraph
 
Reported: 2021-02-05 14:20 UTC by Gabriel Ravier
Modified: 2023-09-25 22:25 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-* aarch64*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2021-02-05 14:20:19 UTC
extern bool z, c;

uint8_t f(uint8_t dest, uint8_t src)
{
    u8 res = dest - src;
    z = !res;
    c = src > dest;
    return res;
}

With -O3, LLVM outputs this:

f(unsigned char, unsigned char):
  mov eax, edi
  sub al, sil
  sete byte ptr [rip + z]
  setb byte ptr [rip + c]
  ret

GCC outputs this:

f(unsigned char, unsigned char):
  mov eax, edi
  sub al, sil
  sete BYTE PTR z[rip]
  cmp dil, sil
  setb BYTE PTR c[rip]
  ret

It seems desirable to eliminate the `cmp`, unless there's some weird flag stall thing I'm not aware of.
Comment 1 Andrew Pinski 2021-12-23 21:13:50 UTC
Confirmed, PR 3507 is part of it (maybe all of it) as shown by:
#include <stdbool.h>
#include <stdint.h>

extern bool z, c;

uint8_t f(uint8_t dest, uint8_t src)
{
    uint8_t res = dest - src;
    //z = !res;
    c = src > dest;
    return res;
}
Comment 2 Andrew Pinski 2021-12-23 21:16:23 UTC
Here is a testcase which shows the issue on other targets (aarch64) too:
#include <stdbool.h>
#include <stdint.h>

extern bool z, c;

uint32_t f(uint32_t dest, uint32_t src)
{
    uint32_t res = dest - src;
    z = !res;
    c = src > dest;
    return res;
}
Comment 3 Hongtao.liu 2021-12-30 09:34:14 UTC
LLVM has a separate module to merge sub and cmp, GCC can do similar thing.
Alternative choice is canonicalizing cmp patterns to be same as subs' with a unused dest(result of sub), then CSE/PRE would be able to do the elimination work?