This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)

From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Sat, 01 Dec 2012 17:49:00 +0000
Subject: [Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)
Auto-submitted: auto-generated
References: <bug-50713-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713

--- Comment #9 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-01 17:49:00 UTC ---
indeed.
and now this other vectorizes also on corei7 ("yesterday" was ok only with AVX)

float64x4_t cross_product(float64x4_t x, float64x4_t y) {
  // yz - zy, zx - xz, xy - yx, 0
  float64x4_t x1200 = (float64x4_t){ x[1], x[2], x[0], x[0] };
  float64x4_t y1200 = (float64x4_t){ y[1], y[2], y[0], y[0] };
  float64x4_t x2010 = (float64x4_t){ x[2], x[0], x[1], x[0] };
  float64x4_t y2010 = (float64x4_t){ y[2], y[0], y[1], y[0] };
  return x1200 * y2010 - x2010 * y1200;
}

the generated code (particularly on AVX) looks comparable to handcrafted one.
(for float32x4 even nicer)
congratsâ

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]