SSE5 patches round 3

Uros Bizjak
Mon Sep 10 18:10:00 GMT 2007

Michael Meissner wrote:

> This patch is the latest combined patch.  If I don't hear any strong
> objections, I will check this in a few hours, and we can fix any problems in
> subsequent patches.  I did a full boostrap build based on changes as they were
> in the tree on Saturday.  I just rev'ed up to today's changes, fixed the merge
> conflicts, and did a quick build, and I will do a full boostrap build/test
> after sending this mail.

I would like some remarks (outlined below) to be addressed before 
committing this large patch (they should be mostly mechanical 
changes/minor problems), but I'd like just to look over them to avoid 
making large changes after your patch is committed. If necessary, we can 
wait with other changes to i386 directory for a day or two to ease your 
merging. Also, as your changes are strictly to files in i386 
directories, you can get a stage3 deadline extension to finish your work.

> Thanks for doing the review.

Thanks for your changes!

Please note that your patch changes smmintrin.h, but there is no mention 
of this change in ChangeLog, as well as no mention of mmintrin-common.h. 
I think that since smmintrin.h header was defined by Intel, H.J. should 
also approve this change. I have added his mail into CC:

I have one objection to this patch. As discussed on 6th of September, 
you should merge


into UNSPEC_SSE5_INTRINSIC to describe both, vector and scalar version 
of insn with single unspec constant:

The insn with unspec is OK for vector modes (to prevent automatic 
combining of vec mul and add into vector fmadd insn as outlined in the 
comment above insn), but for scalar instruction, a vec_merge form should 
be used. instead. Basically, you wrap body of parallel instruction with:

 [(set (match_operand:V4SF 0 "register_operand" "=x")
     (...parallel pattern body...)
     (match_dup 1)
     (const_int 1)))]


It is a de-facto convention to name this pattern "sseX_vm....", where 
'vm' stands for vec_merge. This way, you will need only 
UNSPEC_SSE5_INTRINSIC in both, parallel and scalar (actually vm) 

You can look into i.e. *subv4sf3 and sse_vmsubv4sf3 insn patterns as an 
example of this approach.

> 	* config/i386/driver-i386.c (host_detect_local_cpu): Add basic

The description was truncated.

> *** gcc/testsuite/	2007-09-10 11:32:14.774764000 -0400
> --- gcc/testsuite/	2007-09-09 22:53:47.157284000 -0400
> ***************
> *** 1,8 ****
> ! /* { dg-do compile } */
> ! /* { dg-options "-O2 -msse4.1 -msse4a" } */
>   /* Test that the intrinsics compile with optimization.  All of them are
> !    defined as inline functions in {,x,e,p,t,s,a}mmintrin.h that reference
>      the proper builtin functions.  Defining away "static" and "__inline"
>      results in all of them being compiled as proper functions.  */
> --- 1,8 ----
> ! /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> ! /* { dg-options "-O2 -msse4.1 -msse4a -msse5 " } */
You should substitute -msse4a with -msse5, as -msse5 should implicitly 
define -msse4a.

>   #include <ammintrin.h>
> + #include <bmmintrin.h>
>   #include <smmintrin.h>

Also, you should remove ammintrin.h include, as bmmintrin.h should 
include ammintrin.h.

Please also include changes to sse-12.c, sse-13.c and sse-14.c in 

Thanks for your cooperation,

More information about the Gcc-patches mailing list