This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, Aug 27, 2012 at 6:26 PM, Paolo Carlini <paolo.carlini@oracle.com> wrote:Good, good. I don't see any problem with integrating the code more or less as-is, we have only to figure out a suited, neat, scheme for the includes. Involving the cpu subdirectory as you suggested seems indeed a nice idea, but probably doing everything with a single header per cpu will not scale well in the future, I suppose better adding a whole subdirectory of specializations for each cpu, one file per for each std header. And obviously use a generic fall back for the generic cpu which essentially has just empty headers. I think something quite straightforward should do.My personal opinion is that a concrete example, small, but meaningful and rather self contained, would help. To be honest, at this stage, isn't clear to me which kind of arch-specific optimizations you are thinking about.Here is a first example. Note that for now I just added the code in the middle of random.tcc. This is an implementation for the normal_distribution<double>::__generate<> function using SSE3. The resulting code runs about 25% faster. There is really no way to use the function for any other architecture because it heavily depends on the x86 intrinsics and hence the x86 instructions. But there is no reason why there couldn't be functions with the same interface but completely different implementation for other archs. PPC has Altivec, Arm has Neon.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |