[Bug rtl-optimization/84753] GCC does not fold xxswapd followed by vperm
noloader at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sat Mar 10 23:35:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84753
--- Comment #7 from Jeffrey Walton <noloader at gmail dot com> ---
(In reply to Bill Schmidt from comment #4)
> ...
>
> The best performance will be achieved by writing this loop entirely using
> inline asm code, with all data loaded/stored using lxvd2x and stxvd2x (no
> swaps), thus in "big-endian element order" (element 0 in the high-order
> position of the register). Because of the big-endian nature of vshasigmaw,
> this is always going to be the best approach.
Thanks Bill.
We are working on your lxvd2x suggestion using inline assembly.
Related, see "GCC vec_xl_be replacement using inline assembly",
https://stackoverflow.com/q/49215090/608639.
-----
I'm not sure if I am doing something wrong, or this is a new issue:
$ cat test.cxx
...
typedef __vector unsigned int uint32x4_p8;
uint32x4_p8 VEC_XL_BE(const uint8_t* data, int offset)
{
#if defined(__xlc__) || defined(__xlC__)
return (uint32x4_p8)vec_xl_be(offset, (uint8_t*)data);
#else
uint32x4_p8 res;
__asm(" lxvd2x %x0, %1, %2 \n\t"
: "=wa" (res)
: "g" (data), "g" (offset));
return res;
#endif
}
When I use VEC_XL_BE in real life it results in:
$ g++ -DTEST_MAIN -g3 -O3 -mcpu=power8 sha256-p8.cxx -o sha256-p8.exe
/home/noloader/tmp/ccbDnfFr.s: Assembler messages:
/home/noloader/tmp/ccbDnfFr.s:758: Error: operand out of range (32 is not
between 0 and 31)
/home/noloader/tmp/ccbDnfFr.s:983: Error: operand out of range (48 is not
between 0 and 31)
More information about the Gcc-bugs
mailing list