This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)


On Tue, 11 Sep 2012, Richard Guenther wrote:

On Tue, Sep 11, 2012 at 10:41 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
On Mon, Sep 10, 2012 at 6:37 PM, Richard Henderson <rth@redhat.com> wrote:
Whether or not the compiler creates a clone COULD BE totally up to the
compiler, based on whether or not vectorization is enabled, whether the
loop has been analyzed such that vectorization may proceed, or indeed
the phase of the moon.

But in order for that to happen, the clone must be totally private to
the module for which we are generating code (in the LTO sense, this is
the entire program or dll; without LTO, this is just the object file).
It means that we never attempt to generate clones for functions for
which the body of the function is not visible.

On the other hand, if you insist on assuming a clone exists merely
because a declaration bears an attribute, then you must address ALL
of the problems with respect to defining a stable ABI in the face of
different cpu revisions, different ISAs, and different vector lengths.

I've not seen you address ANY of these problems, despite having the
problem pointed out multiple times.

Indeed, if the definition of an elemental function is always visible to the vectorizer the vectorizer itself can instruct the creation of the clone if it does not already exist (just make those clones managed by the callgraph). Then the clones are visible to the current TU only and no ABI issues exist (though you could say that the vectorizer or the inliner could as well force inlining of elemental functions into places it wants to vectorize - one complication even with local clones is that the x86 ABI has no callee-saved XMM registers which makes function calls inside loops especially expensive).

I thought gcc wouldn't use the x86 ABI for those private calls. I guess what I remember were vague discussions and not a description of the current status...


Btw, this then happily fits into my suggestion that the "elementalness"
can be autodetected by the compiler simply by means of a proper IPA
pass and thus be fully LTO / whole-program aware.  No need for an
attribute (where you'd need to handle the case that the attribute was placed
there by error).

Note that, apart from preventing external calls, it removes this use case:


__attribute__((vector(4))) double mysqrt(double x){return sqrt(x);}

__m256d var;
mysqrt(var);

I am not sure it is the best way to achieve this, but it is one way. I am also planning a patch to turn {sqrt(a),sqrt(b)} into sqrt({a,b}) when the target likes it. And there is a PR asking for a __builtin_math_sqrt.

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]