[Bug target/57952] AVX/AVX2 no ymm registers used in a trivial reduction
mmokrejs at gmail dot com
gcc-bugzilla@gcc.gnu.org
Wed May 17 09:17:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952
--- Comment #7 from mmokrejs at gmail dot com ---
(In reply to Jakub Jelinek from comment #6)
> > $ gcc -O3 -march=native stream.c ; objdump -d a.out | grep ymm | wc -l
> > 63
> > $
>
> Of course, vectorization is only enabled by default for -O3/-Ofast, not at
> -O2, for vectorization at -O2 you need to use -O2 -ftree-vectorize.
$ gcc -O2 -march=native -ftree-vectorize stream.c ; objdump -d a.out | grep
ymm | wc -l
60
$
Ah, thanks. Please update the manpage. It says nothing about the need to use
-O3 or -Ofast interacting with -march=native or -mavx or -mavx2.
<quote>
-march=cpu-type
Generate instructions for the machine type cpu-type. In contrast to
-mtune=cpu-type, which merely tunes the generated code for the specified
cpu-type, -march=cpu-type allows
GCC to generate code that may not run at all on processors other
than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.
The choices for cpu-type are:
native
This selects the CPU to generate code for at compilation time by
determining the processor type of the compiling machine. Using -march=native
enables all instruction
subsets supported by the local machine (hence the result might
not run on different machines). Using -mtune=native produces code optimized
for the local machine under
the constraints of the selected instruction set.
</quote>
<quote>
sandybridge
Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set
support.
ivybridge
Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
instruction set support.
haswell
Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND,
FMA, BMI, BMI2 and F16C
instruction set support.
broadwell
Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE,
SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE,
RDRND, FMA, BMI, BMI2, F16C,
RDSEED, ADCX and PREFETCHW instruction set support.
</quote>
These entrie shave no description text at all:
<quote>
-mavx
-mno-avx
-mavx2
-mno-avx2
-mavx512f
-mno-avx512f
-mavx512pf
-mno-avx512pf
-mavx512er
-mno-avx512er
-mavx512cd
-mno-avx512cd
</quote>
There are hardly any links from within the manpage (notably the intel x86/amd64
section) to -ftree-vectorize.
<quote>
-ftree-vectorize
Perform vectorization on trees. This flag enables
-ftree-loop-vectorize and -ftree-slp-vectorize if not explicitly specified.
-ftree-loop-vectorize
Perform loop vectorization on trees. This flag is enabled by default
at -O3 and when -ftree-vectorize is enabled.
-ftree-slp-vectorize
Perform basic block vectorization on trees. This flag is enabled by
default at -O3 and when -ftree-vectorize is enabled.
</quote>
More information about the Gcc-bugs
mailing list