[Bug target/106022] [12/13 Regression] Enable vectorizer generates extra load
hjl.tools at gmail dot com
Fri Jun 24 21:22:25 GMT 2022
--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #11)
> (In reply to H.J. Lu from comment #9)
> > (In reply to Richard Biener from comment #8)
> > > (In reply to H.J. Lu from comment #6)
> > > > Created attachment 53169 [details]
> > > > A patch
> > > >
> > > > This patch multiplies the vector store cost by the number of scalar elements
> > > > in
> > > > a word to properly compare scalar store cost against vector store cost.
> > >
> > > But that's not "properly" but "wrong" ...
> > >
> > > Note we already cost the vector load from the constant pool so the vector
> > > side costing is correct.
> > >
> > > What's eventually imprecise is the scalar cost where you could anticipate
> > > store merging, but adjusting the vector cost side is just wrong.
> > I tried to adjust the scalar cost. When the scalar cost of storing a byte
> > is 6, dividing it by 8 (the number of scalar elements in a word) becomes 0.
> > Will it work?
> No, I think you would need to pattern match an actual store sequence,
> for example by looking at
> if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> && pow2p_hwi (DR_GROUP_STORE_COUNT (stmt_info)))
> /* cost a possibly merged store only once (but with larger mode?) */
> if (DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info)
The information aren't available in add_stmt_cost. I will
count number of scalar stores and vector stores. Then I will
compare them in finish_cost.
> So costing the whole sequence of scalar stores a single time, with
> adjusted mode.
> store-merging also handles non-QImode stores btw.
More information about the Gcc-bugs