This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
- From: Evgeny Stupachenko <evstupac at gmail dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Eric Botcazou <ebotcazou at adacore dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <richard dot guenther at gmail dot com>, Uros Bizjak <ubizjak at gmail dot com>
- Date: Fri, 21 Nov 2014 13:46:55 +0300
- Subject: Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xwBqY++PGRC_+1=rzHO18jvc+TP5mmR=PjYvGgm=63NuA at mail dot gmail dot com> <CAOvf_xzFR+x+FvUh7dXvvRC3K7=s2Mb3MzjofLnMzoHGNZmKTA at mail dot gmail dot com> <CAOvf_xz7NJrkmE=sVAPKQgfwytFqzvGG7H4ZRJxk-ByUtqpKrA at mail dot gmail dot com> <4566569 dot T6SW4SpEos at polaris> <20141111235116 dot GA11013 at kam dot mff dot cuni dot cz> <CAOvf_xxcKuW+MLuif03x67TmPYgEw2C1ywwG7aAaLZRePgnZ7Q at mail dot gmail dot com>
PING.
"200" currently looks optimal for x86.
Let's commit the following:
2014-11-21 Evgeny Stupachenko <evstupac@gmail.com>
* config/i386/i386.c (ix86_option_override_internal): Increase
PARAM_MAX_COMPLETELY_PEELED_INSNS.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p,
opts->x_param_values,
opts_set->x_param_values);
+ /* Extend full peel max insns parameter for x86. */
+ maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+ 200,
+ opts->x_param_values,
+ opts_set->x_param_values);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Code size for spec2000 is almost unchanged (many benchmarks have the
> same binaries).
> For those that are changed we have the following numbers (200 vs 100,
> both dynamic build -Ofast -funroll-loops -flto):
> 183.equake +10%
> 164.gzip, 173.applu +3,5%
> 187.facerec, 191.fma3d +2,5%
> 200.sixstrack +2%
> 177.mesa, 178.galgel +1%
>
>
> On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and
>>> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
>>> > Higher value of 300 leave the performance of mentioned tests
>>> > unchanged, but add some regressions on other benchmarks.
>>> >
>>> > So I like 200 as well as 120 and 150, but can confirm performance
>>> > gains only for x86.
>>>
>>> IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because
>>> this gave the performance boost without affecting the code size (on x86-64)
>>> and because this was previously 400, but it's your call.
>>
>> Both 150 or 200 globally work for me if there is not too much of code size
>> bloat (did not see code size mentioned here).
>>
>> What I did before decreasing the bounds was strenghtening the loop iteraton
>> count bounds and adding logic the predicts constant propagation enabled by
>> unrolling. For this reason 400 became too large as we did a lot more complete
>> unrolling than before. Also 400 in older compilers is not really 400 in newer.
>>
>> Because I saw performance to drop only with values bellow 50, I went for 100.
>> It would be very interesting to actually analyze what happends for those two
>> benchmarks (that should not be too hard with perf).
>>
>> Honza