This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GIMPLE-level if-converter with scratchpads --- Results from SPEC2006 FP analysis done at Richard`s request {late July / early August} --- results from running all the SPEC2006 CPU FP tests again after adding "-march=native"
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Abe <abe_skolnik at yahoo dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Alan Lawrence <alan dot lawrence at arm dot com>, Sebastian Pop <sebpop at gmail dot com>
- Date: Thu, 27 Aug 2015 10:40:57 +0200
- Subject: Re: GIMPLE-level if-converter with scratchpads --- Results from SPEC2006 FP analysis done at Richard`s request {late July / early August} --- results from running all the SPEC2006 CPU FP tests again after adding "-march=native"
- Authentication-results: sourceware.org; auth=none
- References: <55CBF39A dot 6020109 at yahoo dot com> <CAFiYyc1_VquBaSTQsHvzu3p0LbBXRhCG6d7uqnXD+G3zhS2w2g at mail dot gmail dot com> <55DCDCA5 dot 4040103 at yahoo dot com>
On Tue, Aug 25, 2015 at 11:22 PM, Abe <abe_skolnik@yahoo.com> wrote:
> Dear all,
>
> I have redone the SPEC2006 CPU FP tests again after adding "-march=native".
> Unfortunately, the results are not
> very good for the new if-converter. I believe this is the case because the
> CPU in question [details below] "only"
> has first-generation AVX, and, from what I`ve been told, at least AVX2 is
> needed for scatter/gather and/or
> masked loads/stores, and possibly even AVX512 [the 3rd generation].
masked loads/stores are available with original AVX already. As said repeatedly
scatter / gather is completely irrelevant and will not help
vectorizing if-conversion
using scratch-pads. And if you have masked loads/stores available you don't
need scratch-pads.
> As I
> have written before, in my opinion
> the new converter would be better than the old one if enough time and effort
> were to be spent on it,
> especially the time and effort to make it not add unneeded indirections.
I don't see how the new converter can be better for vectorization. As soon
as you need to introduce a scratch-pad you are lost.
> First, I will give the totals. Then, I`ll give the CPU details for better
> understanding what "-march=native"
> did [or at least should have done]. Then, I`ll give the per-subtest numbers
> that Richard requested.
It's interesting to see that only very few benchmarks care about store
if-conversion and if-conversion in general (because I believe the new
if-converter
ends up disabling vectorization for all if-converted cases).
Richard.
> For concision, I will use "Richard`s check-in" to refer to the GCC I built
> from Richard`s check-in dated July 10 2015
> with Git SHA "cb791e75379bc0c8b10bd13bcb24305c36fd504b" and "git-svn-id:
> svn+ssh://gcc.gnu.org/svn/gcc/trunk@225652".
> [my reason for rebasing the relevant Git check-out to that point: quoting
> Richard`s check-in message:
> "PR tree-optimization/66823
> * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix
> inverted predicate."]
>
> All the compilations were done with "-Ofast". The results, all integers,
> are the number of loops that were vectorized.
>
> Regards,
>
> Abe
>
>
>
>
>
>
>
>
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> no if-conversion-specific flags
> -------------------------------
> 8374
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 8374
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 8388
>
>
> ----
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> no if-conversion-specific flags
> -------------------------------------
> 8275
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 8275
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 8275
>
>
>
>
>
>
>
>
>
> CPU [from "/proc/cpuinfo"]
> --------------------------
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 45
> model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
> stepping : 7
> microcode : 0x710
> cpu MHz : 2499.902
> cache size : 15360 KB
> physical id : 0
> siblings : 12
> core id : 0
> cpu cores : 6
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt
> tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dtherm
> tpr_shadow vnmi flexpriority ept vpid
> bogomips : 4999.80
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> [similarly for the cores numbered 1...23]
>
> kernel: 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64
> x86_64 x86_64 GNU/Linux
>
>
>
>
>
>
>
>
>
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> no if-conversion-specific flags
> -------------------------------
> 410.bwaves: 13
> 416.gamess: 3837
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 172
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3837
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 172
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3850
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> ----
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> no if-conversion-specific flags
> -------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
>