[Bug tree-optimization/42779] [C++0x] Variadic templates + lambdas = extremely poor code quality

Sun Jan 17 20:46:00 GMT 2010

------- Comment #3 from piotr dot wyderski at gmail dot com  2010-01-17 20:46 -------
This is a generic code, as it covers two bug reports.
In fact, it will probably be used as a base for additional
two missing optimization reports. So I thought it would be
good to provide the code of the entire sandbox.

To be more specific: the vectors passed to
combine() are constant. The compiler should
not re-evaluate the base addresses of the
m_Data arrays every iteration, as above:

    mov    (%edi),%ecx
    ...
    mov    0x0(%ebp),%ecx
    ...
    mov    (%esi),%ecx

A single base address fetch phase and
index-based addressing with scaled
induction variable (by a factor of 16)
will be more optimal, e.g.:

   // esi = src1
   // edi = src2
   // ebx = dst
   // edx = induction variable

L0:cmpl   %edx, max_index
   je     L1:
   movdqa (%esi,%edx,1),%xmm0
   por    (%edi,%edx,1),%xmm0
   pxor   %xmm1, %xmm0
   movdqa %xmm0, (%ebx, %edx, 1)
   add    $16, %edx
   jmp    L0

L1:

as I would have written it by hand in assembler.
An aggresively unrolled version (say, four-way)
with prefetching for longer blocks will also be welcome.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42779