This is the mail archive of the
mailing list for the GCC project.
Sorting out the list of P4 issues
- From: "Tom Womack" <tom at womack dot net>
- To: "Jan Hubicka" <jh at suse dot cz>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Wed, 6 Feb 2002 14:39:00 -0000
- Subject: Sorting out the list of P4 issues
>Lets sort out the issues, since time is getting short. We do have problems
>1) performance issues of -march=pentium4
> I will prepare the register ordering patch today and send you
> for experimentation
That would be wonderful, though I suspect I won't be able to play with it
very much before Saturday; I'm busy this evening.
I have one clear example of performance degradation: test 5 of the flops.c
microbenchmark from http://gcc.gnu.org/ml/gcc/2001-07/msg00177.html performs
at 116MFLOPS with -O2 and 100 with -O2 -march=pentium4.
I fear I may have discovered an OS issue: getrusage() measures _current
process_ resources, so (since the flops.c loops use no memory) I'd expect to
get the same results whether running one or six copies of flops at a time,
though over longer periods of real-time. In fact, when I've something else
running (say, recompiling gcc), I get _much_ lower MFLOPs figures
reported -- this is why I first mentioned strong performance degradations.
The 20020204 snapshot actually *miscompiles* the MRI code which had the
large performance degradation with -march=pentium4, though I discovered this
at 1am today and haven't had time to investigate: I'll try to get a
test-case to you on Friday.
>2) lack of good documentation for various SSE options
I've just submitted that to gnatsweb.
I should probably put together a "using SIMD instructions with gcc-3.1 on
Intel machines" Web page, with the information about xmmintrin.h
and -mfpmath and suchlike: I don't see where that kind of tutorial-style
page would fit into the current info files. I'll try to make a first draft
at the weekend.
>3) lack of support for SSE2 intrinsics
This is the critical problem; not conceptually difficult to fix, but I
estimate it would be four hours of tedious work, emacs in one hand and the
P4 manual in the other, for someone who's already at ease with define_insn
in i386.md and def_builtin in i386.c.
I'll work on this if you want, but I rather hope there's someone working on
it already: I'd have to do a lot of reading to figure out how the
define_insn and def_builtin fit together and how to define the various
additional types needed, and I don't have a copyright assignment filed, and
am not sure how the University would react if I presented them with one.
>4) ICEs when using the existing intrics?
No, my ICE is with inline assembler using SSE constraints; I've submitted it
to gnatsweb with Jan in the cc: field, though I haven't got a PR number yet.
I've not tried using the existing SSE1 intrinsics.
>5) lack of stack alignment code.
I haven't run into this problem at all, though would be happy to test
I'm sorry if I've been coming across as complaining wildly and in a
disorganised fashion: I'm learning as I go along how to work with gcc-3.1
snapshots, I've been testing in the evenings and writing one email per
problem encountered, and I should probably have saved some of the problems
for submission via gnatsweb in the mornings.