This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: arch-specific template code


Hi,

On 08/28/2012 01:51 PM, Ulrich Drepper wrote:
On Mon, Aug 27, 2012 at 6:26 PM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
My personal opinion is that a concrete example, small, but meaningful and
rather self contained, would help. To be honest, at this stage, isn't clear
to me which kind of arch-specific optimizations you are thinking about.
Here is a first example.  Note that for now I just added the code in
the middle of random.tcc.  This is an implementation for the
normal_distribution<double>::__generate<> function using SSE3.  The
resulting code runs about 25% faster.  There is really no way to use
the function for any other architecture because it heavily depends on
the x86 intrinsics and hence the x86 instructions.  But there is no
reason why there couldn't be functions with the same interface but
completely different implementation for other archs.  PPC has Altivec,
Arm has Neon.
Good, good. I don't see any problem with integrating the code more or less as-is, we have only to figure out a suited, neat, scheme for the includes. Involving the cpu subdirectory as you suggested seems indeed a nice idea, but probably doing everything with a single header per cpu will not scale well in the future, I suppose better adding a whole subdirectory of specializations for each cpu, one file per for each std header. And obviously use a generic fall back for the generic cpu which essentially has just empty headers. I think something quite straightforward should do.

Paolo.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]