This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: recent troubles with float vectors & bitwise ops

From: tbp <tbptbp at gmail dot com>
To: Mark Mitchell <mark at codesourcery dot com>
Cc: Paolo Bonzini <bonzini at gnu dot org>, Ross Ridge <rridge at csclub dot uwaterloo dot ca>, gcc at gcc dot gnu dot org
Date: Fri, 24 Aug 2007 15:22:15 +0200
Subject: Re: recent troubles with float vectors & bitwise ops
References: <20070823013503.426FE74B95@caffeine.csclub.uwaterloo.ca> <46CD273E.6060009@gnu.org> <46CDF9E2.1050505@codesourcery.com>

Mark Mitchell wrote:

One option is for the user to use intrinsics.  It's been claimed that
results in worse code.  There doesn't seem any obvious reason for that,
but, if true, we should try to fix it; we don't want to penalize people
who are using the intrinsics.  So, let's assume using intrinsics is just
as efficient, either because it already is, or because we make it so.

I maintain that empirical claim; if i compare what gives a simple SOA hybrid 3 coordinates something implemented via intrinsics, builtins and vector when used as the basic component for a raytracer kernel i get as many codegen variations: register allocations differ, stack footprints differ, branches & code organization differ, etc... so it's not that surprising performance also differ. It appears the vector & builtin (which isn't using __m128 but straight v4sf) implementations are mostly on par while the intrinsic based version is slightly slower. Then you factor in how convenient it is, well... was, to use that vector extension to write such something...

Another issue is that for MSVC and ICC, __m128 is a class, but not for gcc so you need more wrapping in C++ but if you know you can let some naked v4sf escape because the compiler always does the right thing with them.

Now while there's some subtleties (and annoying 'features'), i should state that gcc4.3, if you're careful, generates mostly excellent SSE code (especially on x86-64, even more so if compared to icc).

We still have the problem that users now can't write machine-independent
code to do this operation.  Assuming the operations are useful for

That and writing, say, a generic <int,float,double> something takes much much more work.

What are these operation used for?  Can someone give an example of a
kernel than benefits from this kind of thing?

There's of course what Paolo Bonzini described, but also all kind tricks that knowing such operations are extremely efficient encourages. While it would be nice to have such builtins also operate on vectors, if only because they are so common, it's not quite the same as having full freedom and hardware features exposed.

References:
- Re: recent troubles with float vectors & bitwise ops
  - From: Ross Ridge
- Re: recent troubles with float vectors & bitwise ops
  - From: Paolo Bonzini
- Re: recent troubles with float vectors & bitwise ops
  - From: Mark Mitchell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]