[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

linkw at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Sep 25 12:52:14 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

--- Comment #19 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #17)
> On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> > 
> > --- Comment #15 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > (In reply to rguenther@suse.de from comment #14)
> > > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> > > > 
> > > > --- Comment #13 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > > > >   2) on Power, the conversion from unsigned char to unsigned short is nop
> > > > > conversion, when we counting scalar cost, it's counted, then add costs 32
> > > > > totally onto scalar cost. Meanwhile, the conversion from unsigned short to
> > > > > signed short should be counted but it's not (need to check why further). 
> > > > 
> > > > UH to SH conversion is true when calling vect_nop_conversion_p, so it's not
> > > > even put into the cost vector. 
> > > > 
> > > > tree_nop_conversion_p's comments saying:
> > > > 
> > > > /* Return true iff conversion from INNER_TYPE to OUTER_TYPE generates
> > > >    no instruction.  */
> > > > 
> > > > I may miss something here, but UH to SH conversion does need one explicit
> > > > extend instruction *extsh*, the precision/mode equality check looks wrong for
> > > > this conversion.
> > > 
> > > Well, it isn't a RTL predicate and it only needs extension because
> > > there's never a HImode pseudo but always SImode subregs.
> > 
> > Thanks Richi! Should we take care of this case? or neglect this kind of
> > extension as "no instruction"? I was intent to handle it in target specific
> > code, but it isn't recorded into cost vector while it seems too heavy to do the
> > bb_info slp_instances revisits in finish_cost.
> 
> I think it's not something we should handle on GIMPLE.

Got it! For 

          else if (vect_nop_conversion_p (stmt_info))
            continue;

Is it a good idea to change it to call record_stmt_cost like the others? 
  1) introduce one vect_cost_for_stmt enum type eg: nop_stmt
  2) builtin_vectorization_cost return zero for it by default as before.
  3) targets can adjust the cost according to its need


More information about the Gcc-bugs mailing list