When moving a 32-bit quantity into an MMX register,
GCC first zero-extends it as if doing 64-bit arithmetic
emulation, then uses movq to move it into the register.
So, code like:
xorl %edx, %edx
movl %eax, -16(%ebp)
movl %edx, -12(%ebp)
movq -16(%ebp), %mm1
Instead of simply:
movd %eax, %mm1
This (and associated overhead) causes a pretty big
hit for the typical uses of MMX.... the attached
demonstration patch improved one alpha-compositing
routine from 29 million pixels/sec to 51 million
pixels/sec. (With the patch, results for a range
of routines were comparable to hand-written assembly.)
The attached patch just replaces the existing
patterns for zero_extendsidi2 with a pattern using
movd. This is clearly wrong, but my minimal GCC
hacking skills proved unequal to integrating it
CVS Head, 7 December 2002
A simple example demonstrating the code generation
typedef int di __attribute__ ((mode(DI)));
di foo (unsigned int a, unsigned int b)
return __builtin_ia32_por (a, b);
Responsible-Changed-Why: Jan, you are probably best acquainted with the MMX patterns
Jan, have you been able to look at the patch for this PR that was included
with the original report?
Might be related to bug 11628.
*** Bug 11628 has been marked as a duplicate of this bug. ***
Subject: Bug 8871
Module name: gcc
Changes by: email@example.com 2003-08-23 21:18:58
gcc : expr.c ChangeLog
gcc/config/i386: i386.c i386.h i386.md
* i386.c (ix86_expand_carry_flag_compare): Validate operand.
* i386.c (const_0_to_3_operand, const_0_to_7_operand,
const_0_to_15_operand, const_0_to_255_operand): New predicates.
* i386.h (PREDICATE_CODES): Add these.
* i386.c (pinsrw and pextrw patterns): Use them.
* i386.c (ix86_expand_binop_builtin): Behave sanely for VOIDmodes.
* expr.c (convert_modes): Deal properly with integer to vector
* i386.md (zero_extendsidi2*): Add MMX and SSE alternatives.
Fixed by the patch above.