[Bug target/57952] AVX/AVX2 no ymm registers used in a trivial reduction

Wed May 17 09:17:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952

--- Comment #7 from mmokrejs at gmail dot com ---
(In reply to Jakub Jelinek from comment #6)
> > $ gcc -O3 -march=native stream.c  ; objdump -d a.out | grep ymm | wc -l
> > 63
> > $
> 
> Of course, vectorization is only enabled by default for -O3/-Ofast, not at
> -O2, for vectorization at -O2 you need to use -O2 -ftree-vectorize.

$ gcc -O2 -march=native -ftree-vectorize stream.c  ; objdump -d a.out | grep
ymm | wc -l
60
$


Ah, thanks. Please update the manpage. It says nothing about the need to use
-O3 or -Ofast interacting with -march=native or -mavx or -mavx2.

<quote>
       -march=cpu-type
           Generate instructions for the machine type cpu-type.  In contrast to
-mtune=cpu-type, which merely tunes the generated code for the specified
cpu-type, -march=cpu-type allows
           GCC to generate code that may not run at all on processors other
than the one indicated.  Specifying -march=cpu-type implies -mtune=cpu-type.

           The choices for cpu-type are:

           native
               This selects the CPU to generate code for at compilation time by
determining the processor type of the compiling machine.  Using -march=native
enables all instruction
               subsets supported by the local machine (hence the result might
not run on different machines).  Using -mtune=native produces code optimized
for the local machine under
               the constraints of the selected instruction set.
</quote>

<quote>
           sandybridge
               Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set
support.

           ivybridge
               Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
instruction set support.

           haswell
               Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND,
FMA, BMI, BMI2 and F16C
               instruction set support.

           broadwell
               Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE,
SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE,
RDRND, FMA, BMI, BMI2, F16C,
               RDSEED, ADCX and PREFETCHW instruction set support.
</quote>


These entrie shave no description text at all:
<quote>
       -mavx
       -mno-avx
       -mavx2
       -mno-avx2
       -mavx512f
       -mno-avx512f
       -mavx512pf
       -mno-avx512pf
       -mavx512er
       -mno-avx512er
       -mavx512cd
       -mno-avx512cd
</quote>

There are hardly any links from within the manpage (notably the intel x86/amd64
section) to -ftree-vectorize.
<quote>
       -ftree-vectorize
           Perform vectorization on trees. This flag enables
-ftree-loop-vectorize and -ftree-slp-vectorize if not explicitly specified.

       -ftree-loop-vectorize
           Perform loop vectorization on trees. This flag is enabled by default
at -O3 and when -ftree-vectorize is enabled.

       -ftree-slp-vectorize
           Perform basic block vectorization on trees. This flag is enabled by
default at -O3 and when -ftree-vectorize is enabled.
</quote>