This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Missed optimization case

Hi all,

While digging into some GCC-generated code, I noticed a missed
opportunity in GCC that Clang and ICC seem to take advantage of. All
versions of GCC (up to 4.9.0) seem to have the same trouble. The
following source (for x86_64) shows up the problem:

#include <cstdint>

#define add_carry32(sum, v)  __asm__("addl %1, %0 ;"  \
"adcl $0, %0 ;"  \
: "=r" (sum)  \
: "g" ((uint32_t) v), "0" (sum))

unsigned sorta_checksum(const void* src, int n, unsigned sum)
  const uint32_t *s4 = (const uint32_t*) src;
  const uint32_t *es4 = s4 + (n >> 2);

  while( s4 != es4 ) {
    add_carry32(sum, *s4++);

  add_carry32(sum, *(const uint16_t*) s4);
  return sum;

(the example is a contrived version of the original code, which comes
from Solarflare's OpenOnload project).

GCC optimizes the loop but then re-calculates the "s4" variable
outside of the loop before the last add_carry32.  ICC and Clang both
realise that the 's4' value in the loop is fine to re-use. GCC has an
extra four instructions to calculate the same value known to be in a
register upon loop exit.

Compiler explorer links:
GCC 4.9.0:
ICC 13.0.1:
Clang 3.4.1:

I'll happily file a bug if necessary but I'm not clear in what phase
the optimization opportunity has been missed.

Thanks all, Matt

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]