[Bug rtl-optimization/92892] New: [AARCH64] TBL-based permutations can be implemented more efficiently for 2-element vectors

dpochepk at gmail dot com gcc-bugzilla@gcc.gnu.org
Tue Dec 10 16:49:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92892

            Bug ID: 92892
           Summary: [AARCH64] TBL-based permutations can be implemented
                    more efficiently for 2-element vectors
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dpochepk at gmail dot com
  Target Milestone: ---

Current vector elements permutation implementation generates different
instructions depending on specific permutation form. For permutations like:
"target[0] = src1[0]; target[1] = src2[1];" the TBL instruction is used and
following instructions sequence is generated:

mov tmpReg1, src1;
mov tmpReg2, src2;
tbl target, {tmpReg1, tmpReg2}, ...
// the tmpReg1 and tmpReg2 registers which are numbered consecutively, as
required by tbl instruction

For 2-element vectors this sequence can be reduced to:

mov target[0], src1[0]
mov target[1], src2[1]


And it can be reduced to a single mov in case target = src, which is already
implemented in patch prototype I'm working on.


More information about the Gcc-bugs mailing list