[Bug rtl-optimization/92892] New: [AARCH64] TBL-based permutations can be implemented more efficiently for 2-element vectors
dpochepk at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Dec 10 16:49:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92892
Bug ID: 92892
Summary: [AARCH64] TBL-based permutations can be implemented
more efficiently for 2-element vectors
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dpochepk at gmail dot com
Target Milestone: ---
Current vector elements permutation implementation generates different
instructions depending on specific permutation form. For permutations like:
"target[0] = src1[0]; target[1] = src2[1];" the TBL instruction is used and
following instructions sequence is generated:
mov tmpReg1, src1;
mov tmpReg2, src2;
tbl target, {tmpReg1, tmpReg2}, ...
// the tmpReg1 and tmpReg2 registers which are numbered consecutively, as
required by tbl instruction
For 2-element vectors this sequence can be reduced to:
mov target[0], src1[0]
mov target[1], src2[1]
And it can be reduced to a single mov in case target = src, which is already
implemented in patch prototype I'm working on.
More information about the Gcc-bugs
mailing list