This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: MMX built-ins performance oddities
- vector version is about 3% faster than above instead of 10% slower - wow!
So why is gcc 4.0 producing worse code when using intel style intrinsics
and why isn't the union version using builtins as fast as using the vector
version?
I can answer why unions are slower: that's because they are spilled to
memory on every assignment -- GCC 4.0 knows how to replace structs with
different scalar variables (one per item), but not unions. GCC 3.4 knew
about none of these possibilities.
About why vectors are faster, well, a lot of the vector support has been
rewritten in GCC 4.0 so that may be the case.
I do not know exactly why builtins are still slower, but you may want to
create a PR and add me on the CC list (bonzini@gcc.gnu.org).
Paolo