This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: Using bt,bts
On Fri, Sep 28, 2012 at 8:40 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Sep 27, 2012 at 10:52:48AM -0700, Ian Lance Taylor wrote:
>> On Thu, Sep 27, 2012 at 12:35 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> > On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote:
>> >> On Wed, Sep 26, 2012 at 10:34 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> >>
>> >> > is there a reason why for example
>> >> > x=x|(1<<11);
>> >> > is not expanded into
>> >> > bts rax,11
>> >> > ?
>> >>
>> >> The bts instruction is never faster than the corresponding or
>> >> instruction. There's no reason to use it when setting a bit in the
>> >> low 32 bits.
>> >>
>> >> Ian
>> > Following benchmarks tells otherwise. On ivy bridge bts variant is twice
>> > faster than doing or.
>> >
>> > I used
>> >
>> > for(i=0;i<1000000;i++)
>> > x=x|(1<<i);
>>
>> That is a rather odd benchmark. Almost all of the loop iterations
>> will do nothing because the 1 will be left shifted into nothingness.
> From intel reference manual:
Sure, I know. But I don't see why it is relevant. This is C. If you
want to test machine instructions, write assembly code.
>> And if you look back at what I said, I said they were equivalent when
>> setting one of the low order 32 bits, which is what was happening in
>> your original code.
> I did not say that i set lower 32 bits nor did I say that position I set
> is constant.
Well, I tried to answer the question you posed. You now seem to be
asking a different question. Perhaps it has a different answer. But
I'm not sure exactly what question you are asking.
>> Those loops are not equivalent even apart from bts vs. ori. One has
>> four instructions, the other has six.
> Two functions are equivalent if and only if for every input they produce
> same output. That one consist of 10 instructions while other 8 is
> irrelevant.
I thought the point of your example was a micro-benchmark to show that
bts is faster than ori. For a micro-benchmark of a single
instruction, it's highly relevant whether other instructions are being
executed. I apologize if I misunderstood the point of your test case.
Ian