This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly


PING.
"200" currently looks optimal for x86.
Let's commit the following:

2014-11-21  Evgeny Stupachenko  <evstupac@gmail.com>
        * config/i386/i386.c (ix86_option_override_internal): Increase
        PARAM_MAX_COMPLETELY_PEELED_INSNS.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p,
                         opts->x_param_values,
                         opts_set->x_param_values);

+  /* Extend full peel max insns parameter for x86.  */

+  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+                        200,
+                        opts->x_param_values,
+                        opts_set->x_param_values);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0
       && HAVE_prefetch

On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Code size for spec2000 is almost unchanged (many benchmarks have the
> same binaries).
> For those that are changed we have the following numbers (200 vs 100,
> both dynamic build -Ofast -funroll-loops -flto):
> 183.equake +10%
> 164.gzip, 173.applu +3,5%
> 187.facerec, 191.fma3d +2,5%
> 200.sixstrack +2%
> 177.mesa, 178.galgel +1%
>
>
> On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and
>>> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
>>> > Higher value of 300 leave the performance of mentioned tests
>>> > unchanged, but add some regressions on other benchmarks.
>>> >
>>> > So I like 200 as well as 120 and 150, but can confirm performance
>>> > gains only for x86.
>>>
>>> IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because
>>> this gave the performance boost without affecting the code size (on x86-64)
>>> and because this was previously 400, but it's your call.
>>
>> Both 150 or 200 globally work for me if there is not too much of code size
>> bloat (did not see code size mentioned here).
>>
>> What I did before decreasing the bounds was strenghtening the loop iteraton
>> count bounds and adding logic the predicts constant propagation enabled by
>> unrolling. For this reason 400 became too large as we did a lot more complete
>> unrolling than before. Also 400 in older compilers is not really 400 in newer.
>>
>> Because I saw performance to drop only with values bellow 50, I went for 100.
>> It would be very interesting to actually analyze what happends for those two
>> benchmarks (that should not be too hard with perf).
>>
>> Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]