[google gcc-4_8] Tree Loop Unrolling - Relax code size increase with -O2

Sriraman Tallam tmsriram@google.com
Tue Jan 28 01:03:00 GMT 2014


Hi David,

   I had to fix a couple of tests. I have attached the patch with the
fixed tests. The fixes are simple. The tests fail due to two reasons:

1) Tests like bmi2-pext32-1a.c fail because the vectorize loop is
unrolled and directive { "scan-assembler-times "bmi2_pext_si3" 1  }
fails because bmi2_pext_si3 occurs more than once. This is fixed by
changing the directive to scan-assembler

2) Tests like bmi2-bzhi64-1a.c fail because the unrolled loop no
longer needs the bzhi instruction as this gets folded into a constant
since the value is now known for each iteration. In order for this
test to make sense, I disabled the unrolling in O2 by setting the code
size growth to zero via option --param
max-default-completely-peeled-insns=0".

All the  fixes fell into one of the above two patterns with one
exception, pr53265.c. Loop unrolling exposed the array out of bounds
access which is now caught.

Ok to commit?

Thanks
Sri


On Tue, Jan 21, 2014 at 4:51 PM, Xinliang David Li <davidxl@google.com> wrote:
> ok.
>
> David
>
> On Tue, Jan 21, 2014 at 4:46 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Tue, Jan 21, 2014 at 2:49 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> I think it might be better to introduce a new parameter for  max peel
>>> insn at O2 (e.g, call it MAX_O2_COMPLETELY_PEEL_INSN or
>>> MAX_DEFAULT_...), and use the same logic in your patch to override the
>>> MAX_COMPLETELY_PEELED_INSN parameter at O2).
>>>
>>> By so doing, we don't need to have a hard coded factor of 2.
>>
>> Patch attached with that change.
>>
>> Sri
>>
>>>
>>> In the longer run, we really need better cost/benefit analysis, but
>>> that is independent.
>>>
>>> David
>>>
>>> On Tue, Jan 21, 2014 at 1:49 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>      Currently, tree unrolling pass(cunroll) does not allow any code
>>>> size growth in O2 mode.  Code size growth is permitted only if O3 or
>>>> funroll-loops/fpeel-loops is used. I have created  a patch to allow
>>>> partial code size increase in O2 mode. With funroll-loops the maximum
>>>> allowed code growth is 400 unrolled insns. I have set it to 200
>>>> unrolled insns in O2 mode.  This patch improves an image processing
>>>> benchmark by 20%. It improves most benchmarks by 1-2%. The code size
>>>> increase is <1% for all the benchmarks except the image processing
>>>> benchmark which increases by 6% (perf improves by 20%).
>>>>
>>>>      I am working on getting this patch reviewed for trunk. Here is
>>>> the disussion on this:
>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02643.html  I have
>>>> incorporated the comments on making the patch simpler. I will
>>>> follow-up on that patch to trunk by also getting data on limiting
>>>> complete peeling with O2.
>>>>
>>>> Is this ok for the google branch?
>>>>
>>>> Thanks
>>>> Sri
-------------- next part --------------
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 207155)
+++ gcc/params.def	(working copy)
@@ -339,6 +339,11 @@ DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
 	"max-completely-peeled-insns",
 	"The maximum number of insns of a completely peeled loop",
 	400, 0, 0)
+/* The default maximum number of insns of a peeled loop, with -O2.  */
+DEFPARAM(PARAM_MAX_DEFAULT_COMPLETELY_PEELED_INSNS,
+	"max-default-completely-peeled-insns",
+	"The maximum number of insns of a completely peeled loop",
+	200, 0, 0)
 /* The maximum number of peelings of a single loop that is peeled completely.  */
 DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
 	"max-completely-peel-times",
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 207155)
+++ gcc/opts.c	(working copy)
@@ -855,6 +855,18 @@ finish_options (struct gcc_options *opts, struct g
             0, opts->x_param_values, opts_set->x_param_values);
     }
 
+  /* Set PARAM_MAX_COMPLETELY_PEELED_INSNS to the default original value during
+     -O2 when -funroll-loops and -fpeel-loops are not set.   */
+  if (optimize == 2 && !opts->x_flag_unroll_loops && !opts->x_flag_peel_loops
+      && !opts->x_flag_unroll_all_loops)
+
+    {
+      maybe_set_param_value
+       (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+        PARAM_VALUE (PARAM_MAX_DEFAULT_COMPLETELY_PEELED_INSNS),
+	opts->x_param_values, opts_set->x_param_values);
+    }
+
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
      is disabled.  */
   if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	(revision 207155)
+++ gcc/tree-ssa-loop.c	(working copy)
@@ -467,7 +467,7 @@ tree_complete_unroll (void)
 
   return tree_unroll_loops_completely (flag_unroll_loops
 				       || flag_peel_loops
-				       || optimize >= 3, true);
+				       || optimize >= 2, true);
 }
 
 static bool
Index: gcc/testsuite/gcc.target/i386/bmi2-bzhi64-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-bzhi64-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-bzhi64-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-mbmi2 -O2 -dp" } */
+/* { dg-options "-mbmi2 -O2 -dp --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi2-bzhi64-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi2-pext32-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-pext32-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-pext32-1a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi2-pext32-1.c"
 
-/* { dg-final { scan-assembler-times "bmi2_pext_si3" 1 } } */
+/* { dg-final { scan-assembler "bmi2_pext_si3" } } */
Index: gcc/testsuite/gcc.target/i386/avx2-vpaddb-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpaddb-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpaddb-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpaddb\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpaddb\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/bmi-blsr-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsr-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsr-2a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp  --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsr-2.c"
 
Index: gcc/testsuite/gcc.target/i386/avx2-vpsubw-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpsubw-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpsubw-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpsubw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpsubw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/avx2-vpsrlw-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpsrlw-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpsrlw-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpsrlw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpsrlw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/bmi-tzcnt-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-tzcnt-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-tzcnt-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -mbmi -fno-inline" } */
+/* { dg-options "-O2 -mbmi -fno-inline --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-tzcnt-1.c"
 
Index: gcc/testsuite/gcc.target/i386/avx2-vpaddw-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpaddw-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpaddw-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpaddw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpaddw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/avx2-vpsraw-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpsraw-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpsraw-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpsraw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpsraw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/bmi-blsi-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsi-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsi-2a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsi-2.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi-blsr-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsr-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsr-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp  --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsr-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi-bextr-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-bextr-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-bextr-2a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi-bextr-2.c"
 
-/* { dg-final { scan-assembler-times "bmi_bextr_si" 1 } } */
+/* { dg-final { scan-assembler "bmi_bextr_si" } } */
Index: gcc/testsuite/gcc.target/i386/bmi-blsi-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsi-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsi-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsi-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi2-pext64-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-pext64-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-pext64-1a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi2-pext64-1.c"
 
-/* { dg-final { scan-assembler-times "bmi2_pext_di3" 1 } } */
+/* { dg-final { scan-assembler "bmi2_pext_di3" } } */
Index: gcc/testsuite/gcc.target/i386/avx2-vpmullw-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpmullw-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpmullw-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpmullw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpmullw\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/bmi2-pdep32-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-pdep32-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-pdep32-1a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi2-pdep32-1.c"
 
-/* { dg-final { scan-assembler-times "bmi2_pdep_si3" 1 } } */
+/* { dg-final { scan-assembler "bmi2_pdep_si3" } } */
Index: gcc/testsuite/gcc.target/i386/bmi-andn-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-andn-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-andn-2a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp  --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-andn-2.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi2-bzhi32-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-bzhi32-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-bzhi32-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mbmi2 -O2 -dp" } */
+/* { dg-options "-mbmi2 -O2 -dp --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi2-bzhi32-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi-blsmsk-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsmsk-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsmsk-2a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp  --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsmsk-2.c"
 
Index: gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c
===================================================================
--- gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c	(working copy)
@@ -31,4 +31,4 @@ void t3(void)
    r[i] = sqrtf (a[i]);
 }
 
-/* { dg-final { scan-assembler-times "vrsqrtps\[ \\t\]+\[^\n\]*%ymm" 3 } } */
+/* { dg-final { scan-assembler "vrsqrtps\[ \\t\]+\[^\n\]*%ymm" } } */
Index: gcc/testsuite/gcc.target/i386/bmi-bextr-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-bextr-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-bextr-1a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi-bextr-1.c"
 
-/* { dg-final { scan-assembler-times "bmi_bextr_di" 1 } } */
+/* { dg-final { scan-assembler "bmi_bextr_di" } } */
Index: gcc/testsuite/gcc.target/i386/bmi-andn-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-andn-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-andn-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-andn-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi-blsmsk-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-blsmsk-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-blsmsk-1a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp  --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-blsmsk-1.c"
 
Index: gcc/testsuite/gcc.target/i386/bmi-tzcnt-2a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi-tzcnt-2a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi-tzcnt-2a.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi -fno-inline" } */
+/* { dg-options "-O2 -mbmi -fno-inline --param max-default-completely-peeled-insns=0" } */
 
 #include "bmi-tzcnt-2.c"
 
Index: gcc/testsuite/gcc.target/i386/avx2-vpsubb-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/avx2-vpsubb-3.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/avx2-vpsubb-3.c	(working copy)
@@ -8,5 +8,5 @@
 
 #include "avx2-vpop-check.h"
 
-/* { dg-final { scan-assembler-times "vpsubb\[ \\t\]+\[^\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler "vpsubb\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
 /* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/i386/bmi2-pdep64-1a.c
===================================================================
--- gcc/testsuite/gcc.target/i386/bmi2-pdep64-1a.c	(revision 207155)
+++ gcc/testsuite/gcc.target/i386/bmi2-pdep64-1a.c	(working copy)
@@ -3,4 +3,4 @@
 
 #include "bmi2-pdep64-1.c"
 
-/* { dg-final { scan-assembler-times "bmi2_pdep_di3" 1 } } */
+/* { dg-final { scan-assembler "bmi2_pdep_di3" } } */
Index: gcc/testsuite/gcc.dg/pr53265.c
===================================================================
--- gcc/testsuite/gcc.dg/pr53265.c	(revision 207155)
+++ gcc/testsuite/gcc.dg/pr53265.c	(working copy)
@@ -154,3 +154,5 @@ fn12 (void)
   fn11 (1);
   fn11 (1);
 }
+
+/* { dg-prune-output "array subscript is above array bounds" } */


More information about the Gcc-patches mailing list