This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/31000] std::valarray should be annotated with OpenMP directives
- From: "fang at csl dot cornell dot edu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 19 Mar 2007 18:51:59 -0000
- Subject: [Bug libstdc++/31000] std::valarray should be annotated with OpenMP directives
- References: <bug-31000-102@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #6 from fang at csl dot cornell dot edu 2007-03-19 18:51 -------
Subject: Re: std::valarray should be annotated with
OpenMP directives
> "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org> writes:
>
> | (In reply to comment #3)
> | > I suspect that parallelizing for SSE/Altivec might be more peneficial
> | > in most cases than for OpenMP -- OpenMP is a 1,000 pounds gorilla.
> |
> | I certainly agree. The beauty is that one may have both: SSE/Altivec/... if
> | the template argument of std::valarray is float/double/int (in which case one
> | would have to have explicit specializations of the member functions), and
> | OpenMP if it is anything else.
>
> on my single node AMD64 machine, I would prefer the compiler to
> generate codes that takes advantage of SSE than launch OpenMP. On
> the other hand, if I had multiple nodes, I might be contemplating
> OpenMP for some of the valarray<double>s, so I'm not sure the issue is
> that simply cut...
Thinking out loud...
Is there any interest/effort in placing vectorizable operations somewhere
outside of valarray so that other STL algorithms/containers might be able
to be able to leverage them? For example, I'd like to be able to use
tr1/array on basic numeric types and have the benefits of valarray
operations without having to first copy to a valarray, which uses
heap-allocated memory.
I'm imagining something like vectorize_traits that would check for the
operation's vectorizability (std::plus) with the vectorizability of the
value_type (_Integral). Then a subset of algorithms (<numeric> among
others) would have additional level of template-wrapping to dispatch the
appropriate __algorithm() based on vectorize_traits and iterator_traits.
One issue however might be assumptions about the aliasing of input/output
iterators... we're aware that many optimizations rely on non-aliasing
assumptions, whereas the standard algorithms make no such assumptions
(except valarray's ops). A run-time overlap check on
random_access_iterators would incur a slight penalty.
But yes, having STL take advantage of low-level acceleration through
abstraction and compile-time polymorphism is a good thing, IMHO.
Fang
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31000