This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)
Richard Henderson wrote:
> On 07/27/2011 06:21 AM, Georg-Johann Lay wrote:
>> +(define_insn_and_split "*mulsi3"
>> + [(set (match_operand:SI 0 "pseudo_register_operand" "=r")
>> + (mult:SI (match_operand:SI 1 "pseudo_register_operand" "r")
>> + (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn")))
>> + (clobber (reg:DI 18))]
>> + "AVR_HAVE_MUL && !reload_completed"
>> + { gcc_unreachable(); }
>> + "&& 1"
>> + [(set (reg:SI 18)
>> + (match_dup 1))
>
> That seems like it's guaranteed to force an unnecessary move.
It's the same as with the present implementation:
long mul (long a, long b)
{
return a*b;
}
long mul2 (long a, long b)
{
return b*a;
}
translates -Os -mmcu=atmega8 to
mul:
rcall __mulsi3
ret
mul2:
push r12
push r13
push r14
push r15
movw r12,r22
movw r14,r24
movw r24,r20
movw r22,r18
movw r20,r14
movw r18,r12
rcall __mulsi3
pop r15
pop r14
pop r13
pop r12
ret
> Have you tried defining special-purpose register classes to
> force reload to move the data into the right hard regs?
>
> E.g. "Y" prefix
> "QHS" size
> two digit starting register number, as needed.
I already thought about such register classes/constraints with
almost the same nomenclature, i.e. with prefix "R".
> You'll probably end up with quite a few register classes
> out of this, but hopefully reload can do a better job than
> you can manually...
Agreed.
insns that will benefit are insns with two input operands that
commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.
Maybe even other 2-input insns could benefit because there's no
predetermined order in which the moves are accomplished; e.g.
moving R24 before R22 in udivmodqi4. I don't know if register
allocator is smart enough to swap the assignments if that is
better.
Moreover, it would reduce the number of insns resp. split
patterns and help cleanup md.
I'd prefer to do that work in a separate patch. The current patch
behaves the same as the old code, so it's not a performance
regression of the current patch.
Johann