[Bug target/65456] New: powerpc64le autovectorized copy loop missed optimization
anton at samba dot org
gcc-bugzilla@gcc.gnu.org
Wed Mar 18 03:52:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456
Bug ID: 65456
Summary: powerpc64le autovectorized copy loop missed
optimization
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: anton at samba dot org
Created attachment 35049
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35049&action=edit
Testcase pulled from valgrind
The attached copy loop (out of valgrind) produces some pretty bad code:
df8: e4 06 9e 78 rldicr r30,r4,0,59
dfc: e4 26 df 78 rldicr r31,r6,4,59
e00: 10 00 84 38 addi r4,r4,16
e04: 01 00 c6 38 addi r6,r6,1
e08: 99 f6 20 7c lxvd2x vs33,0,r30
e0c: 57 0a 21 f0 xxswapd vs33,vs33
e10: 2b 03 a1 11 vperm v13,v1,v0,v12
e14: 97 0c 01 f0 xxlor vs32,vs33,vs33
e18: 56 6a 0d f0 xxswapd vs0,vs45
e1c: 98 4f 1f 7c stxvd2x vs0,r31,r9
e20: d8 ff 00 42 bdnz df8 <memmove+0x6e8>
Since we are using VSX storage ops, we should just align the source and do
unaligned stores. That will remove the permute, and then the gcc pass to remove
redundant swaps should kick in and remove them too.
More information about the Gcc-bugs
mailing list