This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713

--- Comment #9 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-01 17:49:00 UTC ---
indeed.
and now this other vectorizes also on corei7 ("yesterday" was ok only with AVX)

float64x4_t cross_product(float64x4_t x, float64x4_t y) {
  // yz - zy, zx - xz, xy - yx, 0
  float64x4_t x1200 = (float64x4_t){ x[1], x[2], x[0], x[0] };
  float64x4_t y1200 = (float64x4_t){ y[1], y[2], y[0], y[0] };
  float64x4_t x2010 = (float64x4_t){ x[2], x[0], x[1], x[0] };
  float64x4_t y2010 = (float64x4_t){ y[2], y[0], y[1], y[0] };
  return x1200 * y2010 - x2010 * y1200;
}

the generated code (particularly on AVX) looks comparable to handcrafted one.
(for float32x4 even nicer)
congratsâ


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]