This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: arch-specific template code

From: Marc Glisse <marc dot glisse at inria dot fr>
To: Ulrich Drepper <drepper at gmail dot com>
Cc: libstdc++ at gcc dot gnu dot org, Paolo Carlini <paolo dot carlini at oracle dot com>
Date: Tue, 28 Aug 2012 18:02:24 +0200 (CEST)
Subject: Re: arch-specific template code
References: <CAOPLpQex7niqQ4VWoSheHpbb8-Jf9S+nULgxB3WNUp4r_xyriw@mail.gmail.com> <503BF427.1030300@oracle.com> <CAOPLpQdRUONpKZ9WfD1VP+YgmJVoJxp1L6OL=5hXH2LzjnZ8ww@mail.gmail.com> <alpine.DEB.2.02.1208281418540.20374@stedding.saclay.inria.fr> <CAOPLpQfgMoJjgwtZ6WMvpY41Vj0BD3SUQs3RNO1dbKQzp8KMaw@mail.gmail.com>
Reply-to: libstdc++ at gcc dot gnu dot org

On Tue, 28 Aug 2012, Ulrich Drepper wrote:

On Tue, Aug 28, 2012 at 8:33 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
Actually, it looks to me like most of it can be rewritten using gcc's vector
extensions. _mm_mul_pd(a,b) is just a*b, a[0] gives the first element,
It does? How?

If I use
typedef double v2df __attribute__ ((vector_size (16)));
v2df a;
then I cannot use a[0].

Then you are probably not using a current mainline (it works in C since 4.7 I think, but the C++ support was only added recently (still missing constexpr support)).

gcc has an arch-specific builtin __builtin_ia32_vec_ext_v2df(). There is nothing else I have found which can take its place.

pointer casts work too (and that's what [] is expanded to).

The horizontal add you already noticed.
What is also missing is control over memory accesses.  I.e., loads and
stores from scalar pointers that are possibly unaligned.

Hmm, *mem=vec probably assumes alignment, and memcpy possibly loses too much information (although in __builtin_memcpy the compiler has the information, so it could optimize). Unions are always horrible. Using mem[0]=vec[0]; mem[1]=vec[1]; is the least bad I could find, and indeed it isn't using unaligned instructions. Good point.

There are big gaps in the vector extensions.  In other functions sqrt
is needed, masking is required, etc.  None of this works (at least
with the same efficiency) as using the Intel intrinsics.  Waiting for
the compiler to catch up isn't good IMO.

I can understand that, but it is nice to know exactly what is missing so we have a chance to get there eventually. __builtin_sqrt is a good example indeed. I thought masking was supposed to work (depending on what that means), although possibly only in C.

As you have seen in the code, it should be easy enough for the SSE code to be converted to use the gcc vector extensions if and when they catch up.

Indeed. I just believe vector extensions should be a goal to keep in mind (for later. I hope it is clear all of my questions are for my education and are not meant to block your improvements in any way (besides, I am not a maintainer)).

Thanks again for the time you took answering,

--
Marc Glisse

References:
- arch-specific template code
  - From: Ulrich Drepper
- Re: arch-specific template code
  - From: Paolo Carlini
- Re: arch-specific template code
  - From: Ulrich Drepper
- Re: arch-specific template code
  - From: Marc Glisse
- Re: arch-specific template code
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]