This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I maintain that empirical claim; if i compare what gives a simple SOA hybrid 3 coordinates something implemented via intrinsics, builtins and vector when used as the basic component for a raytracer kernel i get as many codegen variations: register allocations differ, stack footprints differ, branches & code organization differ, etc... so it's not that surprising performance also differ. It appears the vector & builtin (which isn't using __m128 but straight v4sf) implementations are mostly on par while the intrinsic based version is slightly slower.One option is for the user to use intrinsics. It's been claimed that results in worse code. There doesn't seem any obvious reason for that, but, if true, we should try to fix it; we don't want to penalize people who are using the intrinsics. So, let's assume using intrinsics is just as efficient, either because it already is, or because we make it so.
That and writing, say, a generic <int,float,double> something takes much much more work.We still have the problem that users now can't write machine-independent code to do this operation. Assuming the operations are useful for
There's of course what Paolo Bonzini described, but also all kind tricks that knowing such operations are extremely efficient encourages.What are these operation used for? Can someone give an example of a kernel than benefits from this kind of thing?
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |