This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Patch #1 to add SSE5 support to the x86 GCC compiler
> Ok, I have made the minimal changes to the current source base in terms of
> round, rint, etc. just to use TARGET_ROUND instead of TARGET_SSE4_1. We can
> deal with enhancements, etc. of the round, etc. functions in another patch.
>
> Once again, this patch passes the bootstrap/make check on x86_64 (for both -m64
> and -m32) and I'm running make check on my 32-bit system.
>
> Any other comments from the x86 maintainers?
>
> Coming up is patch #2 that adds the rest of the instructions and intrinsics.
>
> --
> Michael Meissner, AMD
> 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
> michael.meissner@amd.com
> +
> + ;; SSE5 parallel XMM conditional moves
> + (define_insn "sse5_pcmov_<mode>"
> + [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x,x,x,x")
> + (if_then_else:SSEMODE
> + (match_operand:SSEMODE 3 "register_operand" "0,0,xm,xm,0,0")
nonimmediate_operand here.
> +
> + @item -mfused-madd
> + @itemx -mno-fused-madd
> + @opindex mfused-madd
> + Enable automatic generation of fused floating point multiply-add instructions
> + if the ISA supports such instructions. The -mfused-madd option is on by
> + default.
What is primary motivation for this? I would expect the fused-madd to
be either win or loss performance wise and codegen being dependent on
the -mcpu setting if it is loss on some...
> +
> + @item -msse5-strict-memory
> + @opindex -msse5-strict-memory
> + Limit SSE5 instructions to a single memory operand internally before register
> + allocation. This more closely matches the format of the hardware instructions
> + but it prevents some combinations from being discovered. It is anticipated
> + that the need for this switch may disappear in the future as the compiler is
> + tuned.
Are you sure this can't be solved by combinner splitter pattern?
When combiner takes more than 3 instructions, you can add a define_split
for the variant with two memories and offload into register as needed.
It should still produce better code out of regalloc and save us from
such a option normal user can at most try to flip.
Honza
> @end table
>
> These @samp{-m} switches are supported in addition to the above