[Patch,AVR]: PR49687 (better widening 32-bit mul)

Wed Jul 27 16:12:00 GMT 2011

Richard Henderson wrote:
> On 07/27/2011 06:21 AM, Georg-Johann Lay wrote:
>> +(define_insn_and_split "*mulsi3"
>> +  [(set (match_operand:SI 0 "pseudo_register_operand"                      "=r")
>> +        (mult:SI (match_operand:SI 1 "pseudo_register_operand"              "r")
>> +                 (match_operand:SI 2 "pseudo_register_or_const_int_operand" "rn")))
>> +   (clobber (reg:DI 18))]
>> +  "AVR_HAVE_MUL && !reload_completed"
>> +  { gcc_unreachable(); }
>> +  "&& 1"
>> +  [(set (reg:SI 18)
>> +        (match_dup 1))
> 
> That seems like it's guaranteed to force an unnecessary move.

It's the same as with the present implementation:

long mul (long a, long b)
{
    return a*b;
}

long mul2 (long a, long b)
{
    return b*a;
}

translates -Os -mmcu=atmega8 to

mul:
	rcall __mulsi3
	ret

mul2:
	push r12
	push r13
	push r14
	push r15
	movw r12,r22
	movw r14,r24
	movw r24,r20
	movw r22,r18
	movw r20,r14
	movw r18,r12
	rcall __mulsi3
	pop r15
	pop r14
	pop r13
	pop r12
	ret

> Have you tried defining special-purpose register classes to
> force reload to move the data into the right hard regs?
> 
> E.g.  "Y" prefix
>       "QHS" size
>       two digit starting register number, as needed.

I already thought about such register classes/constraints with
almost the same nomenclature, i.e. with prefix "R".

> You'll probably end up with quite a few register classes 
> out of this, but hopefully reload can do a better job than
> you can manually...

Agreed.

insns that will benefit are insns with two input operands that
commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.

Maybe even other 2-input insns could benefit because there's no
predetermined order in which the moves are accomplished; e.g.
moving R24 before R22 in udivmodqi4.  I don't know if register
allocator is smart enough to swap the assignments if that is
better.

Moreover, it would reduce the number of insns resp. split
patterns and help cleanup md.

I'd prefer to do that work in a separate patch.  The current patch
behaves the same as the old code, so it's not a performance
regression of the current patch.

Johann