This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[jh@suse.cz: Improve -msse documentation (Was: Sorting out the list of P4 issues)]


The docs patch is inside, so I should've CCed it to patches as well,
but forgotten :(
----- Forwarded message from Jan Hubicka <jh@suse.cz> -----

From: Jan Hubicka <jh@suse.cz>
To: Tom Womack <tom@womack.net>
Cc: Jan Hubicka <jh@suse.cz>, gcc@gcc.gnu.org, rth@cygnus.com
Subject: Improve -msse documentation (Was: Sorting out the list of P4 issues)

> >Lets sort out the issues, since time is getting short.  We do have problems
> with
> 
> >1) performance issues of -march=pentium4
> >   I will prepare the register ordering patch today and send you
> >   for experimentation

Should be partly tracked now in the mainline.   It is well possible that
-march=pentium4 will still cause slowdowns over -march=pentiumpro.  I would
be interested in seeing the testcases.

Problem of pentium4 is that it do have extremly fast simple operations and
expensive shifts, multiplication and divide.  Modelling this properly for gcc
makes gcc to produce large code causing performance drop due to limited
instruction cache.

Proper sollution is to use profile to find intermost loops and do the expansion
on the critical paths only, but this is out of reach of 3.1 (possible for 3.2
as the profile framework is already at place and this is relatively trivial
change).

As temporary solution, I will lower the latencies in gcc model. That should
result in better code for larger testcases and slower code for trivial loops.
> 
> >2) lack of good documentation for various SSE options
> 
> I've just submitted that to gnatsweb.

I am attaching patch to cleanup it.
> 
> >3) lack of support for SSE2 intrinsics
> 
> This is the critical problem; not conceptually difficult to fix, but I
> estimate it would be four hours of tedious work, emacs in one hand and the
> P4 manual in the other, for someone who's already at ease with define_insn
> in i386.md and def_builtin in i386.c.

There may be some showstoppers on the way.  I hope someone has already tried
that.

Can you test whether the SSE1 builtins actually work well for you?
> 
> I'll work on this if you want, but I rather hope there's someone working on
> it already: I'd have to do a lot of reading to figure out how the
> 
> >4) ICEs when using the existing intrics?
> 
> No, my ICE is with inline assembler using SSE constraints; I've submitted it
> to gnatsweb with Jan in the cc: field, though I haven't got a PR number yet.
> I've not tried using the existing SSE1 intrinsics.

This has been tracked down by Richard.
> 
> >5) lack of stack alignment code.
> 
> I haven't run into this problem at all, though would be happy to test
> things.

It should not be so big problem in practice. As long as everything is compiled
by 3.1.x, it should work.  Only code compiled by older compilers missalign stack
frame.

The stack alignment patch is available, but it is not ready for 3.1.x.


Thu Feb  7 12:15:32 CET 2002  Jan Hubicka  <jh@suse.cz>
	* invoke.texi (-msse-math): Remove
	(-msse): Document more closely the behaviour of option to avoid
	confusion.
Index: invoke.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/doc/invoke.texi,v
retrieving revision 1.112
diff -c -3 -p -r1.112 invoke.texi
*** invoke.texi	2002/02/06 05:13:10	1.112
--- invoke.texi	2002/02/07 11:15:11
*************** in the following sections.
*** 481,487 ****
  -mno-fp-ret-in-387  -msoft-float  -msvr3-shlib @gol
  -mno-wide-multiply  -mrtd  -malign-double @gol
  -mpreferred-stack-boundary=@var{num} @gol
! -mmmx  -msse -msse2 -msse-math -m3dnow @gol
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -momit-leaf-frame-pointer @gol
--- 481,487 ----
  -mno-fp-ret-in-387  -msoft-float  -msvr3-shlib @gol
  -mno-wide-multiply  -mrtd  -malign-double @gol
  -mpreferred-stack-boundary=@var{num} @gol
! -mmmx  -msse -msse2 -m3dnow @gol
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -momit-leaf-frame-pointer @gol
*************** preferred alignment to @option{-mpreferr
*** 7763,7773 ****
  @opindex mno-sse
  @opindex m3dnow
  @opindex mno-3dnow
! These switches enable or disable the use of built-in functions that allow
! direct access to the MMX, SSE and 3Dnow extensions of the instruction set.
  
! @xref{X86 Built-in Functions}, for details of the functions enabled
! and disabled by these switches.
  
  @item -mpush-args
  @itemx -mno-push-args
--- 7763,7783 ----
  @opindex mno-sse
  @opindex m3dnow
  @opindex mno-3dnow
! These switches enable or disable the use of MMX, SSE and 3Dnow extensions.
! This includes availablity of new builtins as well as limited amount of
! automatical SSE code generation.  @xref{X86 Built-in Functions}, for
! details of the functions enabled and disabled by these switches.
! 
! To enable automatical usage of SSE extensions for single precesion scalar
! arithmetics and SSE2 extensions for double precision scalar arithmetics,
! use @option{-mfpmath}.
! 
! The option is implied by appropriate @option{-march} settings.
! @option{-mfpmath} needs to be always controlled manually as it changes ABI
! behaviour.  For i386 compiler, the x87 arithmetics is used, while for x86-64
! SSE arithmetics is used by default.
  
! Automatical vectorization is not supported yet.
  
  @item -mpush-args
  @itemx -mno-push-args

----- End forwarded message -----


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]