This is the mail archive of the
mailing list for the GCC project.
Re: Writing a dot product that vectorizes without -fassociative-math -fno-signed-zeros -fno-trapping-math
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Thomas Koenig <tkoenig at netcologne dot de>
- Cc: gcc mailing list <gcc at gcc dot gnu dot org>, "fortran at gcc dot gnu dot org" <fortran at gcc dot gnu dot org>
- Date: Mon, 8 Jun 2015 10:03:03 +0200
- Subject: Re: Writing a dot product that vectorizes without -fassociative-math -fno-signed-zeros -fno-trapping-math
- Authentication-results: sourceware.org; auth=none
- References: <557033D2 dot 5040400 at netcologne dot de>
On Thu, Jun 4, 2015 at 1:17 PM, Thomas Koenig <email@example.com> wrote:
> Hello world,
> Assume I want to calculate a dot product,
> s = sum(a[i]*b[i], i=1..n)
> The order of summation in this case should be arbitrary.
> Currently, the way to do this is to write out an explicit loop
> (either by by the user or by the compliler, such as a DOT_PRODUCT)
> and specify the options (for the whole translation unit) that
> allow associative math.
> Could there be a way to specify more finegrained approch which can
> set a 'yes, you can use associative math on this particular expression'
> to enable automatic vectorization of, for exaple, DOT_PRODUCT?
There isn't currently a way to do this (apart from a hack to outline
the loop to a function and stick a -fassociative-math option attribute
on it...). A full middle-end solution would be to either have alternate
tree codes for associatable ops or flags on the expression tree
(see the undefined-overflow branch work / discussion on both alternatives -
they both have downsides). Of course there are very many more
options that would need similar handling...
An alternative would be to make the option changing more localized
by say, a new wrapping GENERIC tree (like we have PAREN_EXPR
for the reverse effect). So
REGION_EXPR (reassoc, .... GENERIC code ...)
and then lower the 'reassoc' (or whatever bits we invent later) during
gimplification to flags on the GIMPLE stmt (avoiding the flags on the
GENERIC expression trees) or perform the outlining to a function
Another alternative is to allow this kind of flag-changing only on
loops and use ANNOTATE_EXPR to say (this loop has ops
that are associatable across iterations) and thus only have an
effect on loop optimizations (in this case vectorization).
All of this is quite some work (see the unfinished no-undefined-overflow