This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/31000] std::valarray should be annotated with OpenMP directives



------- Comment #6 from fang at csl dot cornell dot edu  2007-03-19 18:51 -------
Subject: Re:  std::valarray should be annotated with
 OpenMP directives

> "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org> writes:
>
> | (In reply to comment #3)
> | > I suspect that parallelizing for SSE/Altivec might be more peneficial
> | > in most cases than for OpenMP -- OpenMP is a 1,000 pounds gorilla.
> |
> | I certainly agree. The beauty is that one may have both: SSE/Altivec/... if
> | the template argument of std::valarray is float/double/int (in which case one
> | would have to have explicit specializations of the member functions), and
> | OpenMP if it is anything else.
>
> on my single node AMD64 machine, I would prefer the compiler to
> generate codes that takes advantage of SSE than launch OpenMP.  On
> the other hand, if I had multiple nodes, I might be contemplating
> OpenMP for some of the valarray<double>s, so I'm not sure the issue is
> that simply cut...

Thinking out loud...

Is there any interest/effort in placing vectorizable operations somewhere
outside of valarray so that other STL algorithms/containers might be able
to be able to leverage them?  For example, I'd like to be able to use
tr1/array on basic numeric types and have the benefits of valarray
operations without having to first copy to a valarray, which uses
heap-allocated memory.

I'm imagining something like vectorize_traits that would check for the
operation's vectorizability (std::plus) with the vectorizability of the
value_type (_Integral).  Then a subset of algorithms (<numeric> among
others) would have additional level of template-wrapping to dispatch the
appropriate __algorithm() based on vectorize_traits and iterator_traits.
One issue however might be assumptions about the aliasing of input/output
iterators... we're aware that many optimizations rely on non-aliasing
assumptions, whereas the standard algorithms make no such assumptions
(except valarray's ops).  A run-time overlap check on
random_access_iterators would incur a slight penalty.

But yes, having STL take advantage of low-level acceleration through
abstraction and compile-time polymorphism is a good thing, IMHO.

Fang


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31000


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]