Patch #1 to add SSE5 support to the x86 GCC compiler

Jan Hubicka jh@suse.cz
Thu Sep 6 18:33:00 GMT 2007


> Ok, I have made the minimal changes to the current source base in terms of
> round, rint, etc. just to use TARGET_ROUND instead of TARGET_SSE4_1.  We can
> deal with enhancements, etc. of the round, etc. functions in another patch.
> 
> Once again, this patch passes the bootstrap/make check on x86_64 (for both -m64
> and -m32) and I'm running make check on my 32-bit system.
> 
> Any other comments from the x86 maintainers?
> 
> Coming up is patch #2 that adds the rest of the instructions and intrinsics.
> 
> -- 
> Michael Meissner, AMD
> 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
> michael.meissner@amd.com

> + 
> + ;; SSE5 parallel XMM conditional moves
> + (define_insn "sse5_pcmov_<mode>"
> +   [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x,x,x,x")
> + 	(if_then_else:SSEMODE 
> + 	  (match_operand:SSEMODE 3 "register_operand" "0,0,xm,xm,0,0")

nonimmediate_operand here.
> + 
> + @item -mfused-madd
> + @itemx -mno-fused-madd
> + @opindex mfused-madd
> + Enable automatic generation of fused floating point multiply-add instructions
> + if the ISA supports such instructions.  The -mfused-madd option is on by
> + default.

What is primary motivation for this?  I would expect the fused-madd to
be either win or loss performance wise and codegen being dependent on
the -mcpu setting if it is loss on some...
> + 
> + @item -msse5-strict-memory
> + @opindex -msse5-strict-memory
> + Limit SSE5 instructions to a single memory operand internally before register
> + allocation.  This more closely matches the format of the hardware instructions
> + but it prevents some combinations from being discovered.  It is anticipated
> + that the need for this switch may disappear in the future as the compiler is
> + tuned.

Are you sure this can't be solved by combinner splitter pattern?
When combiner takes more than 3 instructions, you can add a define_split
for the variant with two memories and offload into register as needed.
It should still produce better code out of regalloc and save us from
such a option normal user can at most try to flip.

Honza
>   @end table
>   
>   These @samp{-m} switches are supported in addition to the above



More information about the Gcc-patches mailing list