This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Sorting out the list of P4 issues

>Lets sort out the issues, since time is getting short.  We do have problems

>1) performance issues of -march=pentium4
>   I will prepare the register ordering patch today and send you
>   for experimentation

That would be wonderful, though I suspect I won't be able to play with it
very much before Saturday; I'm busy this evening.

I have one clear example of performance degradation: test 5 of the flops.c
microbenchmark from performs
at 116MFLOPS with -O2 and 100 with -O2 -march=pentium4.

I fear I may have discovered an OS issue: getrusage() measures _current
process_ resources, so (since the flops.c loops use no memory) I'd expect to
get the same results whether running one or six copies of flops at a time,
though over longer periods of real-time. In fact, when I've something else
running (say, recompiling gcc), I get _much_ lower MFLOPs figures
reported -- this is why I first mentioned strong performance degradations.

The 20020204 snapshot actually *miscompiles* the MRI code which had the
large performance degradation with -march=pentium4, though I discovered this
at 1am today and haven't had time to investigate:  I'll try to get a
test-case to you on Friday.

>2) lack of good documentation for various SSE options

I've just submitted that to gnatsweb.

I should probably put together a "using SIMD instructions with gcc-3.1 on
Intel machines" Web page, with the information about xmmintrin.h
and -mfpmath and suchlike: I don't see where that kind of tutorial-style
page would fit into the current info files. I'll try to make a first draft
at the weekend.

>3) lack of support for SSE2 intrinsics

This is the critical problem; not conceptually difficult to fix, but I
estimate it would be four hours of tedious work, emacs in one hand and the
P4 manual in the other, for someone who's already at ease with define_insn
in and def_builtin in i386.c.

I'll work on this if you want, but I rather hope there's someone working on
it already: I'd have to do a lot of reading to figure out how the
define_insn and def_builtin fit together and how to define the various
additional types needed, and I don't have a copyright assignment filed, and
am not sure how the University would react if I presented them with one.

>4) ICEs when using the existing intrics?

No, my ICE is with inline assembler using SSE constraints; I've submitted it
to gnatsweb with Jan in the cc: field, though I haven't got a PR number yet.
I've not tried using the existing SSE1 intrinsics.

>5) lack of stack alignment code.

I haven't run into this problem at all, though would be happy to test

I'm sorry if I've been coming across as complaining wildly and in a
disorganised fashion: I'm learning as I go along how to work with gcc-3.1
snapshots, I've been testing in the evenings and writing one email per
problem encountered, and I should probably have saved some of the problems
for submission via gnatsweb in the mornings.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]