This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH][RFC] Detect and use implementations of BLAS routines
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Richard Biener <rguenther at suse dot de>
- Cc: gcc-patches at gcc dot gnu dot org, fortran at gcc dot gnu dot org
- Date: Wed, 2 Oct 2013 19:46:34 +0400 (MSK)
- Subject: Re: [PATCH][RFC] Detect and use implementations of BLAS routines
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LNX dot 2 dot 00 dot 1310021550410 dot 5759 at zhemvz dot fhfr dot qr>
You probably want to disable this transformation when the number of iterations
is predicted to be small, right?
Shouldn't dot product transform be predicated on -fassociative-math?
Do you have a vision of a generalized pattern matcher to allow adding other
I'm curious what gap is between GCC's vectorizer output and fine-tuned BLAS
libraries. [*] Or is the intention here to enable use of accelerated BLAS on
HSA-like architectures? Or using BLAS when the vectorizer can't possibly
match it (matmult -- but then again it's not easy to pattern-match in the
first place; or non-trivial strides -- but what can a BLAS lib do in that
[*] The gap is definitely huge on something like ia64 (IIRC vectorization
is not important there, but you need to unroll and schedule carefully), but
I presume you're mostly interested in x86-64.
GCC currently has a somewhat similar in spirit feature for the vectorizer --
-mveclibabi. Is it known how it is used in practice?