This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/46091] missed optimization: x86 bt/btc/bts instructions


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46091

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Avi Kivity from comment #9)
> I believe the comment is wrong. Here's what the manual says:
> 
> "This instruction can be used with a LOCK prefix to allow the instruction to
> be executed atomically."
> 
> Implying that without the LOCK prefix, it is not atomic. XCHG is the only
> instruction that asserts LOCK implicitly.
> 
> Agner lists BTC reciprocal throughput as 1 for imm, mem case and 5 for reg,
> mem. The latter is slow, but perhaps still worthwhile as a replacement for
> the code in the first comment (but probably not when addressing a single
> word).

BTC/BTR/BTS with a memory operand (RMW) is indeed slower, but so are other
logic instructions. Following testcase:

--cut here--
extern unsigned long long a;

void
test (void)
{
  a &= ~(1ull << 55);
}
--cut here--

should generate RMW BTR instruction.

I'll look into this a bit some more.  However, these insn should be rare, so do
not expect any noticeable application speed-up ...

> Note there is also the BT instruction (with reciprocal throughput of 0.5!)

Yes, we already emit this.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]