Bug 86024 - Missed memcpy loop distribution with elementwise copy
Summary: Missed memcpy loop distribution with elementwise copy
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-06-01 11:27 UTC by Marc Glisse
Modified: 2018-06-08 13:36 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-06-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2018-06-01 11:27:01 UTC
typedef struct A { int a, b; } A;
void*f(A*restrict p){
  A*q=__builtin_malloc(1024*sizeof(A));
  for(int i=0;i<1024;++i){
#ifdef HELP
    q[i]=p[i];
#else
    q[i].a=p[i].a;
    q[i].b=p[i].b;
#endif
  }
  return q;
}

At -O3, with HELP, we get the expected memcpy. Without it, the loop is only vectorized.
Comment 1 Richard Biener 2018-06-01 11:47:12 UTC
Confirmed.  loop distribution only handles stride 1 accesses and single loads/stores for the pattern recognition.

With my ongoing work on vectorizer refactoring it might be possible to
re-use its DR group analysis and thus work on DR groups here.

Or we may want to teach this pattern to the vectorizer itself (eh...).

Or we may want to un-"SRA" such patterns, generating aggregate copies.
Comment 2 Marc Glisse 2018-06-08 13:27:07 UTC
(In reply to Richard Biener from comment #1)
> Or we may want to un-"SRA" such patterns, generating aggregate copies.

I notice that store-merging does not merge these stores, I didn't check why. SLP can do it for long but not for int (no vector of 2 ints) with -fdisable-tree-vect.

(anyway that's too late for ldist, the DR / vectorizer approach sounds better, just mentioning this as another possible missed optimization)

The testcase is a simplified version of boost::container::flat_map<int,int>. The most important missing transformation is memmove, but it was easier to report memcpy and I kind of expect that they may all be fixed together.
Comment 3 bin cheng 2018-06-08 13:36:43 UTC
(In reply to Marc Glisse from comment #2)
> (In reply to Richard Biener from comment #1)
> > Or we may want to un-"SRA" such patterns, generating aggregate copies.
> 
> I notice that store-merging does not merge these stores, I didn't check why.
> SLP can do it for long but not for int (no vector of 2 ints) with
> -fdisable-tree-vect.
> 
> (anyway that's too late for ldist, the DR / vectorizer approach sounds
> better, just mentioning this as another possible missed optimization)
Yes, merging and SRA are conflicting with each other here, and it's difficult to get a model deciding when to do what.  With DR improvement, we can identify and connect two or more builtin partitions in ldist.