Summary: | GCC fails to vectorize code unless dummy loop is added | ||
---|---|---|---|
Product: | gcc | Reporter: | Moritz Kreutzer <moritz.kreutzer> |
Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | rguenth, webrown.cpp |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 7.2.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | 7.3.1, 8.0 | Last reconfirmed: | 2018-03-26 00:00:00 |
Bug Depends on: | 65206 | ||
Bug Blocks: | 53947 | ||
Attachments: | Example which GCC fails to vectorize |
Description
Moritz Kreutzer
2018-03-23 21:08:04 UTC
Attachment is missing. Created attachment 43752 [details]
Example which GCC fails to vectorize
(In reply to Richard Biener from comment #1) > Attachment is missing. Thanks! I could swear that I uploaded the attachment in the first place, but it seems like I forgot to click the button to actually attach it. The issue lies in dependence analysis which faces _21 = (sizetype) i_24; _22 = _21 * 8; _2 = &a + _22; _13 = MEM[(const Type_t &)&a][i_24].v[0]; _14 = _13 * 5.0e-1; MEM[(double &)_2] = _14; marks the two refs for a runtime alias test and then when doing that figures they always alias (but doesn't handle the distance == 0 case specially). This is a dup of another existing bug that dependence analysis doesn't cope very well with a mix of pointer vs. array accesses. (In reply to Richard Biener from comment #4) > The issue lies in dependence analysis which faces > > _21 = (sizetype) i_24; > _22 = _21 * 8; > _2 = &a + _22; > _13 = MEM[(const Type_t &)&a][i_24].v[0]; > _14 = _13 * 5.0e-1; > MEM[(double &)_2] = _14; > > marks the two refs for a runtime alias test and then when doing that > figures they always alias (but doesn't handle the distance == 0 case > specially). But then I still don't understand how adding the dummy loop helps GCC in determining the independence of loop iterations. > > This is a dup of another existing bug that dependence analysis doesn't > cope very well with a mix of pointer vs. array accesses. Are you talking about 65206? Seems like it's not an easy bug to fix? Anyways, I hope it helps to have proof of another manifestation of this bug... (In reply to Moritz Kreutzer from comment #5) > (In reply to Richard Biener from comment #4) > > The issue lies in dependence analysis which faces > > > > _21 = (sizetype) i_24; > > _22 = _21 * 8; > > _2 = &a + _22; > > _13 = MEM[(const Type_t &)&a][i_24].v[0]; > > _14 = _13 * 5.0e-1; > > MEM[(double &)_2] = _14; > > > > marks the two refs for a runtime alias test and then when doing that > > figures they always alias (but doesn't handle the distance == 0 case > > specially). > > But then I still don't understand how adding the dummy loop helps GCC in > determining the independence of loop iterations. I didn't try to see why but I guess "bad luck" ;) It probably makes the first access a pointer one as well. OK, so looking closer we have after early optimization: <bb 6> [99.00%]: _2 = &a[i_5]; _13 = MEM[(const Type_t &)_2].v[0]; _14 = _13 * 5.0e-1; MEM[(double &)_2] = _14; but then later forwprop is "lucky" to propagate _2 into just one of the dereferences. Note that propagating into both wouldn't help because the accesses do not have a similar structure -- one accesses a[i].v[0] while the other accesses a[i] as if it were a 'double'. That seems to be _2 = &a[i_7]; D.39137 = 5.0e-1; D.39483 = operator*<double, 1, double> (&D.39137, _2); [return slot optimization] vs. _3 = &a[i_7]; Vector<1, double>::operator=<Pete::Expression<Pete::BinaryNode<Pete::OpMultiply, Pete::Scalar<double>, Pete::Reference<Vector<1, double> > > > > (_3, &D.39483); so somehow LHS vs. RHS evaluation goes a different path. Not sure if that's avoidable (it's been some time since I worked with PETE). > > > > This is a dup of another existing bug that dependence analysis doesn't > > cope very well with a mix of pointer vs. array accesses. > > Are you talking about 65206? Seems like it's not an easy bug to fix? > Anyways, I hope it helps to have proof of another manifestation of this > bug... Yeah, that one looks like the same issue. Whether it's easy or not easy to fix remains to be seen - it's mostly a matter of priority... (In reply to Richard Biener from comment #6) > I didn't try to see why but I guess "bad luck" ;) It probably makes > the first access a pointer one as well. Okay, in that case I'd rather call it "good luck" :) > OK, so looking closer we have after early optimization: > > <bb 6> [99.00%]: > _2 = &a[i_5]; > _13 = MEM[(const Type_t &)_2].v[0]; > _14 = _13 * 5.0e-1; > MEM[(double &)_2] = _14; > > but then later forwprop is "lucky" to propagate _2 into just one of the > dereferences. Note that propagating into both wouldn't help because > the accesses do not have a similar structure -- one accesses a[i].v[0] > while the other accesses a[i] as if it were a 'double'. That seems to be > > _2 = &a[i_7]; > D.39137 = 5.0e-1; > D.39483 = operator*<double, 1, double> (&D.39137, _2); [return slot > optimization] > > vs. > > _3 = &a[i_7]; > Vector<1, > double>::operator=<Pete::Expression<Pete::BinaryNode<Pete::OpMultiply, > Pete::Scalar<double>, Pete::Reference<Vector<1, double> > > > > (_3, > &D.39483); > > so somehow LHS vs. RHS evaluation goes a different path. Not sure if that's > avoidable (it's been some time since I worked with PETE). I'll try to have a look into PETE to see whether we this can be avoided. Otherwise, I'll just keep the dummy loop: It helps GCC to vectorize the code and otherwise, it should just be ignored by any compiler. So I guess it should at least do no harm. > Yeah, that one looks like the same issue. Whether it's easy or not easy > to fix remains to be seen - it's mostly a matter of priority... Okay, I'll stay in the loop. Thanks for your prompt reply and for your help! |