[Bug tree-optimization/42779] [C++0x] Variadic templates + lambdas = extremely poor code quality
piotr dot wyderski at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sun Jan 17 20:46:00 GMT 2010
------- Comment #3 from piotr dot wyderski at gmail dot com 2010-01-17 20:46 -------
This is a generic code, as it covers two bug reports.
In fact, it will probably be used as a base for additional
two missing optimization reports. So I thought it would be
good to provide the code of the entire sandbox.
To be more specific: the vectors passed to
combine() are constant. The compiler should
not re-evaluate the base addresses of the
m_Data arrays every iteration, as above:
mov (%edi),%ecx
...
mov 0x0(%ebp),%ecx
...
mov (%esi),%ecx
A single base address fetch phase and
index-based addressing with scaled
induction variable (by a factor of 16)
will be more optimal, e.g.:
// esi = src1
// edi = src2
// ebx = dst
// edx = induction variable
L0:cmpl %edx, max_index
je L1:
movdqa (%esi,%edx,1),%xmm0
por (%edi,%edx,1),%xmm0
pxor %xmm1, %xmm0
movdqa %xmm0, (%ebx, %edx, 1)
add $16, %edx
jmp L0
L1:
as I would have written it by hand in assembler.
An aggresively unrolled version (say, four-way)
with prefetching for longer blocks will also be welcome.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42779
More information about the Gcc-bugs
mailing list