This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Auto-vectorization: need to know what to expect
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "Benoît Jacob" <jacob at math dot jussieu dot fr>
- Cc: gcc at gcc dot gnu dot org, g dot gael at free dot fr
- Date: Mon, 17 Mar 2008 15:59:21 +0100
- Subject: Re: Auto-vectorization: need to know what to expect
- References: <200803171545.55905.jacob@math.jussieu.fr>
On Mon, Mar 17, 2008 at 3:45 PM, Benoît Jacob <jacob@math.jussieu.fr> wrote:
> Dear All,
>
> I am currently (co-)developing a Free (GPL/LGPL) C++ library for vector/matrix
> math.
>
> A major decision that we need to take is, what to do regarding vectorization
> instructions (SSE). Either we rely on GCC to auto-vectorize, or we control
> explicitly the vectorization using GCC's special primitives. The latter
> solution is of course more difficult, and would to some degree obfuscate our
> source code, so we wish to know whether or not it's really necessary.
>
> GCC 4.3.0 does auto-vectorize our loops, but the resulting code has worse
> performance than a version with unrolled loops and no vectorization. By
> contrast, ICC auto-vectorizes the same loops in a way that makes them
> significantly faster than the unrolled-loops non-vectorized version.
>
> If you want to know, the loops in question typically look like:
> for(int i = 0; i < COMPILE_TIME_CONSTANT; i++)
> {
> // some abstract c++ code with deep recursive templates and
> // deep recursive inline functions, but resulting in only a
> // few assembly instructions
> a().b().c().d(i) = x().y().z(i);
> }
>
> As said above, it's crucial for us to be able to get an idea of what to
> expect, because design decisions depend on that. Should we expect large
> improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ?
In general GCCs autovectorization capabilities are quite good, cases
where we miss opportunities do of course exist. There were improvements
regarding autovectorization capabilities in every GCC release and I expect
that to continue for future releases (though I cannot promise anything
as GCC is a volunteer driven project - but certainly testcases where we
miss optimizations are welcome - often we don't know of all corner cases).
If you require to get the absolute most out of your CPU I recommend to
provide special routines tuned for the different CPU families and I
recommend the use of the standard intrinsics headers (*mmintr.h) for
this. Of course this comes at a high cost of maintainance (and initial work),
so autovectorization might prove good enough. Often tuning the
source for a given compiler has a similar effect than producing vectorized
code manually. Looking at GCC tree dumps and knowing a bit about
GCC internals helps you here ;)
> A roadmap or a GCC developer sharing his thoughts would be very helpful.
Thanks,
Richard.