[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Sep 25 12:57:26 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #19)
> (In reply to rguenther@suse.de from comment #17)
> > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> > > 
> > > --- Comment #15 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > > (In reply to rguenther@suse.de from comment #14)
> > > > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote:
> > > > 
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
> > > > > 
> > > > > --- Comment #13 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > > > > >   2) on Power, the conversion from unsigned char to unsigned short is nop
> > > > > > conversion, when we counting scalar cost, it's counted, then add costs 32
> > > > > > totally onto scalar cost. Meanwhile, the conversion from unsigned short to
> > > > > > signed short should be counted but it's not (need to check why further). 
> > > > > 
> > > > > UH to SH conversion is true when calling vect_nop_conversion_p, so it's not
> > > > > even put into the cost vector. 
> > > > > 
> > > > > tree_nop_conversion_p's comments saying:
> > > > > 
> > > > > /* Return true iff conversion from INNER_TYPE to OUTER_TYPE generates
> > > > >    no instruction.  */
> > > > > 
> > > > > I may miss something here, but UH to SH conversion does need one explicit
> > > > > extend instruction *extsh*, the precision/mode equality check looks wrong for
> > > > > this conversion.
> > > > 
> > > > Well, it isn't a RTL predicate and it only needs extension because
> > > > there's never a HImode pseudo but always SImode subregs.
> > > 
> > > Thanks Richi! Should we take care of this case? or neglect this kind of
> > > extension as "no instruction"? I was intent to handle it in target specific
> > > code, but it isn't recorded into cost vector while it seems too heavy to do the
> > > bb_info slp_instances revisits in finish_cost.
> > 
> > I think it's not something we should handle on GIMPLE.
> 
> Got it! For 
> 
> 	  else if (vect_nop_conversion_p (stmt_info))
> 	    continue;
> 
> Is it a good idea to change it to call record_stmt_cost like the others? 
>   1) introduce one vect_cost_for_stmt enum type eg: nop_stmt
>   2) builtin_vectorization_cost return zero for it by default as before.
>   3) targets can adjust the cost according to its need

I think this early-out was added for the case where there was no cost but
the target costed it.  So at least go back and look what target that was
and see if it can be adjusted.


More information about the Gcc-bugs mailing list