This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)
- From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 01 Dec 2012 17:49:00 +0000
- Subject: [Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)
- Auto-submitted: auto-generated
- References: <bug-50713-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713
--- Comment #9 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-01 17:49:00 UTC ---
indeed.
and now this other vectorizes also on corei7 ("yesterday" was ok only with AVX)
float64x4_t cross_product(float64x4_t x, float64x4_t y) {
// yz - zy, zx - xz, xy - yx, 0
float64x4_t x1200 = (float64x4_t){ x[1], x[2], x[0], x[0] };
float64x4_t y1200 = (float64x4_t){ y[1], y[2], y[0], y[0] };
float64x4_t x2010 = (float64x4_t){ x[2], x[0], x[1], x[0] };
float64x4_t y2010 = (float64x4_t){ y[2], y[0], y[1], y[0] };
return x1200 * y2010 - x2010 * y1200;
}
the generated code (particularly on AVX) looks comparable to handcrafted one.
(for float32x4 even nicer)
congratsâ