This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/46091] missed optimization: x86 bt/btc/bts instructions

From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 14 Aug 2017 17:25:07 +0000
Subject: [Bug target/46091] missed optimization: x86 bt/btc/bts instructions
Auto-submitted: auto-generated
References: <bug-46091-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46091

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Avi Kivity from comment #9)
> I believe the comment is wrong. Here's what the manual says:
> 
> "This instruction can be used with a LOCK prefix to allow the instruction to
> be executed atomically."
> 
> Implying that without the LOCK prefix, it is not atomic. XCHG is the only
> instruction that asserts LOCK implicitly.
> 
> Agner lists BTC reciprocal throughput as 1 for imm, mem case and 5 for reg,
> mem. The latter is slow, but perhaps still worthwhile as a replacement for
> the code in the first comment (but probably not when addressing a single
> word).

BTC/BTR/BTS with a memory operand (RMW) is indeed slower, but so are other
logic instructions. Following testcase:

--cut here--
extern unsigned long long a;

void
test (void)
{
  a &= ~(1ull << 55);
}
--cut here--

should generate RMW BTR instruction.

I'll look into this a bit some more.  However, these insn should be rare, so do
not expect any noticeable application speed-up ...

> Note there is also the BT instruction (with reciprocal throughput of 0.5!)

Yes, we already emit this.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]