This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: P4 support question (-> FAQ?)

I will assume you are running on Windows, since the Intel 4. compiler was
for that OS.  The Windows implementations of gcc with which I am familiar
don't enforce even 8-byte alignment; to do that and get reasonable
performance on 64-bit data you must rebuild binutils with a larger
SECTION_ALIGNMENT specified in bfd/coff-i386.c.  Unfortunately, this breaks
the C++ libraries of the released g++.  If you wish to use SSE
instructions, or take full advantage of the .p2align jump alignments, you
will need some way to persuade binutils to set 16-byte alignment; the
better linux implementations do 16- or 32-byte alignment.  P4, AFAIK, drops
the code alignment requirements of P-II and P-III, but has even larger
penalties for cache line splits caused by data which straddle cache line
boundaries (normally avoided by following the usual alignment rules).  If
you have tried the gcc options, you are probably aware that -Os is often
faster than -O2 on P-II/III; but -O2 is more often beneficial on P4.  It
will be interesting to see how the SuSE work on SSE for gcc turns out.

I don't expect to be at liberty to publish much of this work any time soon.
The primary compilers which I am evaluating are Intel , MSVC, gcc/g77, and
CVF.  The latter is a master at making P4 look bad in some cases and
excellent in others, and clearly shows that code can be generated which
runs OK on P-III and K7 but not on P4.
----- Original Message -----
From: "Will Menninger" <>
To: <>
Cc: <>
Sent: Tuesday, May 29, 2001 6:52 AM
Subject: Re: P4 support question (-> FAQ?)

> Tim,
> Can you reveal what compilers you are testing and when/where you might
> be publishing the results?  My experience is that on Pentium-III's,
> Intel's C compiler (v4.x) beats gcc (2.7, 2.8, 2.95) by 10%-40% margins
> (largely double precision floating point codes).  But I also have a
> finite element field-solving C++ code where gcc beats Intel and MS VC by
> more than a factor of two.  I have noticed that the Intel compiler seems
> to do a better job of aligning local variables on the stack to optimize
> performance (the P-III apparently likes 16-byte alignments(?)).
> -Will
> "Tim Prince" writes:
> >>[...beginning of thread deleted for brevity...]
> >
> >I don't expect any changes in scheduling to make major differences in
> >performance on P4; gcc already does quite well when P-II optimizations
> >selected.  I'm in the middle of a comparison of efficiency of code
> >sequences generated by various compilers; while SSE code is required in
> >many situations to make the best of P4, there are situations where gcc
> >generates P-II compatible code which is faster than that available from
> >commercial compilers.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]