[Bug rtl-optimization/19680] sub-optimial register allocation with sse

Mon Jan 31 22:22:00 GMT 2005

------- Additional Comments From tbptbp at gmail dot com  2005-01-31 22:21 -------
Oops, my bad. Thought pshufd mixed both operands ÃƒÂ  la shufps; i'm obviously not
familiar with the integer side of SSE.

And yes the combination is a lose, albeit a small one around 3%. But i'm timing
the whole thing not just that intersection code.

To give you an idea here's the peak perf for a given tree.
intersection inlined:
ICC: 16.4 fps
gcc: 14.7 unpatched, 14.4 patched
gcc & -fno-gcse: 14.8 unpatched, 14.5 patched

intersection not inlined:
gcc: 14.9 unpatched, 13.7 patched
gcc & -fno-gcse: 14.8 unpatched, 14.0 patched

The problem is that the surrounding code (kdtree search) also change a lot and
gcc doesn't find an optimal way for that either.
What's reassuring is that ICC isn't much smarter than gcc on the tree search.
Each compiler comes up with nice trick here and there but none got the whole
picture right. Still ICC doesn't suck as much on average.
I'm going to replace all that tree search with asm to settle the issue someday.
BTW the difference between ICC and GCC is around ~5% on the scalar path.

I'm going to try each patch in turn as requested.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680