Bug 101806 - Extra zero extends for some arguments in some cases
Summary: Extra zero extends for some arguments in some cases
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 12.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-08-06 21:10 UTC by Andrew Pinski
Modified: 2022-12-25 05:30 UTC (History)
0 users

See Also:
Host:
Target: aarch64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2021-08-06 21:10:50 UTC
Take:
bool g(bool a, bool b)
{
  return ~a & b;
}
---- CUT ---
Currently we produce:
        and     w1, w1, 255
        and     w0, w0, 255
        bic     w0, w1, w0
        and     w0, w0, 1

---- CUT ---
But we should produce:
        bic     w0, w1, w0
        and     w0, w0, 1

The zero extends are not needed.
This happens because combine does the correct thing until it tries to figure out the cutting point:Trying 2, 8 -> 16:
    2: r98:SI=zero_extend(x0:QI)
      REG_DEAD x0:QI
    8: r102:SI=~r98:SI&r99:SI
      REG_DEAD r98:SI
      REG_DEAD r99:SI
   16: x0:SI=r102:SI&0x1
      REG_DEAD r102:SI
Failed to match this instruction:
(set (reg:SI 0 x0)
    (and:SI (and:SI (not:SI (reg:SI 0 x0 [ a ]))
            (reg/v:SI 99 [ b ]))
        (const_int 1 [0x1])))
Successfully matched this instruction:
(set (reg:SI 102)
    (not:SI (reg:SI 0 x0 [ a ])))
Failed to match this instruction:
(set (reg:SI 0 x0)
    (and:SI (and:SI (reg:SI 102)
            (reg/v:SI 99 [ b ]))
        (const_int 1 [0x1])))

If we had chose (and:SI (not:SI (reg:SI 0 x0 [ a ])) (reg/v:SI 99 [ b ])) instead, we would have gotten the correct thing.
Comment 1 Andrew Pinski 2021-08-06 21:13:23 UTC
It happens to work on x86-64(with -march=skylake-avx512) becausewe get a zero_extend instead of an and there. I still don't understand how x86 is able to figure out the &1 part.

Trying 11, 9 -> 12:
   11: r94:SI=zero_extend(r97:SI#0)
      REG_DEAD r97:SI
    9: r92:SI=zero_extend(r96:SI#0)
      REG_DEAD r96:SI
   12: {r95:SI=~r92:SI&r94:SI;clobber flags:CC;}
      REG_DEAD r92:SI
      REG_UNUSED flags:CC
      REG_DEAD r94:SI
Failed to match this instruction:
(parallel [
        (set (reg:SI 95)
            (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0))
                    (subreg:QI (reg:SI 97) 0))))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:SI 95)
    (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0))
            (subreg:QI (reg:SI 97) 0))))
Successfully matched this instruction:
(set (reg:QI 94 [ b ])
    (and:QI (not:QI (subreg:QI (reg:SI 96) 0))
        (subreg:QI (reg:SI 97) 0)))
Successfully matched this instruction:
(set (reg:SI 95)
    (zero_extend:SI (reg:QI 94 [ b ])))
Comment 2 Andrew Pinski 2022-10-27 03:21:50 UTC
I think this will be fixed/improved by https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602089.html .
Comment 3 Andrew Pinski 2022-12-25 05:30:58 UTC
Even a simple:
unsigned char g(unsigned char a, unsigned char b)
{
  return ((~a) & b)&1;
}

Produces the extra zero extend.

But it is ok with:
unsigned char g(unsigned char *a, unsigned char *b)
{
  return ((~*a) & *b)&1;
}

It looks like it is hard register related too ...