[Bug tree-optimization/79262] New: [6/7 Regression] load gap with store gap causing performance regression in 462.libquantum
pinskia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sat Jan 28 08:33:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79262
Bug ID: 79262
Summary: [6/7 Regression] load gap with store gap causing
performance regression in 462.libquantum
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Blocks: 53947
Target Milestone: ---
Target: aarch64
As reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438#c9 but what
is not mentioned is that this is a regression from GCC 5. I noticed this again
when I was working on improving ThunderX 2 CN99xx performance difference
between -O2 and -Ofast and GCC 5.4.0 and the trunk.
Take:
struct node_struct
{
float _Complex gap;
unsigned long long state;
};
struct reg_struct
{
int size;
struct node_struct *node;
};
void
func(int target, struct reg_struct *reg)
{
int i;
for(i=0; i<reg->size; i++)
reg->node[i].state ^= ((unsigned long long) 1 << target);
}
---- CUT ---
Currently this is vectorized on the trunk using load gaps but then the store is
using scalars. This is much slower and also it is only doing 2 at a time.
There are some cost model issues in the aarch64 backend dealing with scalar for
int vs floating point too. I might just go fix those first.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
More information about the Gcc-bugs
mailing list