RFC reminder: an alternative -fsched-pressure implementation

Mon Apr 23 17:08:00 GMT 2012

On 04/23/2012 11:42 AM, Ulrich Weigand wrote:
> Vladimir Makarov wrote:
>
>> I have a mixed feeling with the patch. I've tried it on SPEC2000 on
>> x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
>> (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
>> comparison with the current algorithm. It is slower too. Although
>> the difference is quite insignificant on Corei7, compiler speed
>> slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
>> generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
>> and practically the same average code on x86-64 and ARM (I've tried
>> only SPECINT on ARM).
> Was this using NEON on ARM?  The biggest benefits we saw involved
> vector code ...
Sorry, No.
>> On 04/17/2012 04:29 AM, Richard Sandiford wrote:
>>> Anyway, given those results and your mixed feelings, I think it would
>>> be best to drop the patch.  It's a lot of code to carry around, and its
>>> ad-hoc nature would make it hard to change in future.  There must be
>>> better ways of achieving the same thing.
>>>
>> It is up to you, Richard.  I don't mind if you commit it into the trunk.
>>
>> There is something in your approach too.  If it would be possible to get
>> the best of the two methods, we could see a big improvement.
> On s390, the patch is pretty much a straight improvement across the line:
> 0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
> in GemsFDTD (9.3%), and no regression beyond usual noise levels.
That is a really big improvement.
> Given that, and the impressive improvements we saw on some ARM benchmarks
> involving vector code (up to 70%), it would really be a pity to see the
> patch going nowhere ...
>
> Is there anything we can do to help make the patch more acceptable?
> Any suggestions welcome ...
>
I did not object the patch.  As I wrote Richard, it is up to him to 
decide to commit the patch or not (it still is).  I had mixed results on 
x86-64 -- some better and some worse (but with average the same) with 
pretty big variations.  Those big variations mean that people could use 
this for fine-tuning their programs.

Taking your results for S390 and ARM with Neon into account, I guess it 
should be included and probably made by default for these 2 targets (for 
sure for s390).