This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

From: Jan Hubicka <hubicka at ucw dot cz>
To: Richard Biener <rguenther at suse dot de>
Cc: gcc-patches at gcc dot gnu dot org, Venkataramanan dot Kumar at amd dot com
Date: Wed, 18 Oct 2017 17:07:58 +0200
Subject: Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
Authentication-results: sourceware.org; auth=none
References: <20171017133415.GC94155@kam.mff.cuni.cz> <alpine.LSU.2.20.1710171536550.5588@zhemvz.fhfr.qr> <20171017172217.GD94155@kam.mff.cuni.cz> <alpine.LSU.2.20.1710180926510.5588@zhemvz.fhfr.qr> <20171018122805.GB55338@kam.mff.cuni.cz> <alpine.LSU.2.20.1710181501450.5588@zhemvz.fhfr.qr>

> > Those instructions seems similarly expensive in Intel implementation.
> > http://users.atw.hu/instlatx64/GenuineIntel0050654_SkylakeXeon9_InstLatX64.txt
> > lists latencies ranging from 18 to 32 cycles.
> > 
> > Of course it may also be the case that the utility is measuring gathers incorrectly.
> > according to Agner's table Skylake has optimized gathers, they used to be
> > 12 to 34 uops on haswell and are no 4 to 5.
> > > 
> > > > > Note the most major source of impreciseness in the cost model
> > > > > is from vec_perm because we lack the information of the
> > > > > permutation mask which means we can't distinguish between
> > > > > cross-lane and intra-lane permutes.
> > > > 
> > > > Besides that we lack information about what operation we do (addition
> > > > or division?) which may be useful to pass down, especially because we do
> > > > have relevant information handy in the x86_cost tables.  So I am thinking
> > > > of adding extra parameter to the hook telling the operation.
> > > 
> > > Not sure.  The costs are all supposed to be relative to scalar cost
> > > and I fear we get nearer to a GIGO syndrome when adding more information
> > > here ;)
> > 
> > Yep, however there is setup cost (like loads/stores) which comes into game
> > as well.  I will see how far i can get by making x86 costs more "realistic"
> 
> I think it should be always counting the cost of n scalar loads plus
> an overhead depending on the microarchitecture.  As you say we're
> not getting rid of any memory latencies (in the worst case).  From
> Agner I read Skylake optimized gathers down to the actual memory
> access cost, the overhead is basically well hidden.

Where did you find it? It does not seem to quite match the instruction latency table
above.

Honza
> 
> Richard.
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

References:
- [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Jan Hubicka
- Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Richard Biener
- Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Jan Hubicka
- Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Richard Biener
- Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Jan Hubicka
- Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]