This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Using bt,bts


On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote:
> On Wed, Sep 26, 2012 at 10:34 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> 
> > is there a reason why for example
> > x=x|(1<<11);
> > is not expanded into
> > bts rax,11
> > ?
> 
> The bts instruction is never faster than the corresponding or
> instruction.  There's no reason to use it when setting a bit in the
> low 32 bits.
> 
> Ian
Following benchmarks tells otherwise. On ivy bridge bts variant is twice
faster than doing or.

I used

 for(i=0;i<1000000;i++)
    x=x|(1<<i);

implemented as 

.globl main
  .type main, @function
main:
.LFB0:
  .cfi_startproc
  xorl  %eax, %eax
  xorl  %ecx, %ecx
  movl  $1, %edx
  .p2align 4,,10
  .p2align 3
.L2:
  bts %ecx, %edx
  addl  $1, %ecx
  cmpl  $100000000, %ecx
  jne .L2
  rep
  ret
.cfi_endproc

and

.globl main
  .type main, @function
main:
.LFB0:
  .cfi_startproc
  xorl  %eax, %eax
  xorl  %ecx, %ecx
  movl  $1, %edx
  .p2align 4,,10
  .p2align 3
.L2:
  movl  %edx, %esi
  sall  %cl, %esi
  addl  $1, %ecx
  orl %esi, %eax
  cmpl  $100000000, %ecx
  jne .L2
  rep
  ret
.cfi_endproc




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]