[Bug target/53949] [SH] Add support for mac.w / mac.l instructions
olegendo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sun Jul 15 12:11:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53949
--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-07-15 12:11:20 UTC ---
Created attachment 27799
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27799
Proof of concept patch
This is a proof of concept patch just to probe around.
The idea is to allow the RA to allocate macl and mach registers in DImode, and
have mac insns that use the macl/mach regs as a pair in DImode.
With the patch applied, the following function ...
int64_t test01 (const int16_t* a, const int16_t* b)
{
int64_t sum = 0;
for (int i = 0; i < 16; ++i)
sum += (int64_t)(*a++) * (int64_t)(*b++);
return sum;
}
compiled with -m4 -O2 results in ...
__Z6test01PKsS0_:
.LFB0:
.cfi_startproc
mov #16,r1 ! 88 movsi_ie/3 [length = 2]
clrmac ! 39 clrmac/1 [length = 2]
.align 2
.L3:
dt r1 ! 89 dect [length = 2]
bf/s .L3 ! 90 branch_false [length = 2]
mac.w @r4+,@r5+ ! 61 *macw [length = 2]
sts macl,r0 ! 82 movsi_ie/8 [length = 2]
rts ! 99 *return_i [length = 2]
sts mach,r1 ! 83 movsi_ie/8 [length = 2]
... which is not that bad already.
Some notes I took while playing around with this:
- When compiling for big endian the RA mistakes mach and macl when
storing mach:macl to a DImode reg:reg pair.
This could probably fixed by providing appropriate move insns patterns.
- Move insns/splits for DImode mach:macl <-> memory have to be added.
I've seen an ICE when compiling with -O1:
error: unrecognizable insn:
(insn 122 14 15 2 (set (mem/c:DI (plus:SI (reg/f:SI 15 r15)
(const_int 8 [0x8])) [0 %sfp+-8 S8 A32])
(reg:DI 148 macl)) sh_mac.cpp:38 -1
(nil))
- In some cases the mach:macl reg pair gets swapped to a general reg pair
without any obvious need. Example function:
int64_t test04 (const int16_t* a, const int16_t* b,
const int16_t* c, const int16_t* d)
{
int64_t sum0 = 0;
int64_t sum1 = 0;
for (int i = 0; i < 16; ++i)
sum0 += (int64_t)(*a++) * (int64_t)(*b++);
for (int i = 0; i < 16; ++i)
sum1 += (int64_t)(*c++) * (int64_t)(*d++);
return sum0 - sum1;
}
The IRA pass first allocates sum0 and sum1 to mach:macl, but then reload
seems to think that they are conflicting and moves sum0 to a general regs
pair. This results in ...
mov #0,r2
mov #16,r1
mov r2,r3
.L16:
lds r2,macl
lds r3,mach
dt r1
mac.w @r4+,@r5+
sts macl,r2
bf/s .L16
sts mach,r3
mov #16,r1
clrmac
.align 2
.L18:
dt r1
bf/s .L18
mac.w @r6+,@r7+
which would be better as:
mov #16,r1
clrmac
.L16:
dt r1
bf/s .L16
mac.w @r4+,@r5+
sts macl,r2
sts mach,r3
clrmac
mov #16,r1
.L18:
dt r1
bf/s .L18
mac.w @r6+,@r7+
- Loops with multiple running sums like
for (int i = 0; i < 16; ++i)
{
sum0 += (int64_t)(*a++) * (int64_t)(*b++);
sum1 += (int64_t)(*c++) * (int64_t)(*d++);
}
result in macl:mach swapping to general reg pairs between subsequent
mac.w instructions. Ideally such loops should be split into multiple
loops like in the previous example.
- When loop unrolling is turned on the auto-inc addresses refs are
converted to displacement addresses. Because the auto-inc-dec pass
currently fails to detect a lot of auto-inc-dec possibilities the
mac.w pattern will not match.
The same goes for manually unrolled code like
sum += (int64_t)(*a++) * (int64_t)(*b++);
sum += (int64_t)(*a++) * (int64_t)(*b++);
- Running sum variables should be turned into DImode variables if possible:
int32_t test00 (const int16_t* a, const int16_t* b)
{
int32_t sum = 0;
for (int i = 0; i < 16; ++i)
sum += (*a++) * (*b++);
return sum;
}
- The existing multiplication patterns could be adopted to utilize macl:mach
reg pair allocation, especially 32x32 -> 64 bit multiplications.
- Normal multiplications that do not need a full MAC operation but use
memory operands can be done with a clrmac-mac sequence.
Probably there are more subtle issues. Also, I have not tried expanding
the standard name 'maddmn4' pattern, maybe it would make some of the
problems mentioned above automagically disappear.
More information about the Gcc-bugs
mailing list