This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Generated ASM of a typical clamp

I have been studying the asm generated by a typical clamping function,
and I am confused about the results.  This is done on an Opteron 6k
series compiled with -fverbose-asm, -O3 and -march=native.

float clamp(float const x, float const min, float const max) {
#if defined (BRANCH)
  if ( x > max )
    return max;
  else if ( x < min )
    return min;
    return x;
#elif defined (BRANCH2)
  return x > max ? max : ( x < min ? min : x );
#elif defined (CALL)
  return __builtin_fminf(__builtin_fmaxf(x, min), max);
  float const t = x < min ? min : x;
  return t> max ? max : t;

The first two approaches are obviously identical, and produce:

        vucomiss        %xmm2, %xmm0    # max, x
        ja      .L3     #,
        vmaxss  %xmm0, %xmm1, %xmm0     # x, min, D.2214
        .p2align 4,,7
        .p2align 3
        vmovaps %xmm2, %xmm0    # max, D.2214

This one I figured would be great, given the use of builtins:

        subq    $24, %rsp       #,
        .cfi_def_cfa_offset 32
        vmovss  %xmm2, 12(%rsp) # max, %sfp
        call    fmaxf   #
        vmovss  12(%rsp), %xmm2 # %sfp, max
        addq    $24, %rsp       #,
        .cfi_def_cfa_offset 8
        vmovaps %xmm2, %xmm1    # max,
        jmp     fminf   #

But then we have what appears to be the best of them all....  just a
couple instructions, no branches, no calls, nothing:

        vmaxss  %xmm0, %xmm1, %xmm0     # x, min, D.2219
        vminss  %xmm0, %xmm2, %xmm0     # D.2219, max, D.2219

So I'm curious.... why is the last approach optimized better than the
naive approach of some nested if statements?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]