This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.4] Strange performance regression?


Hi Tim

thanks for the reply.

On Wed, Oct 14, 2009 at 12:27 AM, Tim Prince <n8tm@aol.com> wrote:
> Ian Lance Taylor wrote:
>>
>> In my experience, a performance drop in a tight loop when you remove a
>> line of code means that your loop is extremely sensitive to cache line
>> boundaries. ?It can be difficult to find the optimal code other than
>> by testing various command line options. ?Options to particularly test
>> are -falign-loops, -falign-labels, and -falign-jumps.
>
> That seems useful advice. ?The align- options could help the hot loops fit
> Loop Stream Detector criteria. ?If you set -funroll-loops, you may exceed
> the loop size which fits LSD on older CPUs, but you would often make the LSD
> unnecessary.

Blast it! -funroll-loops did the trick, now the speed is again within
5% of the optimal performance. Just for the record, the flags I'm
using right now are:

-O2 -march=core2 -funroll-loops -fomit-frame-pointer

\o/

>>
>> Also, be sure that you are using a -mtune option appropriate for the
>> processor on which you are running. ?E.g., you mention Core2, so you
>> should be using -mtune=core2.
>
> For the 64-bit compiler, the default may be better than core2, but for
> 32-bit you should be using at least -march=pentium-m. ?If you are using
> vectorizer, -mtune=barcelona could make a difference either way.
> How are you controlling which threads run on which cache, in case there are
> cache sharing considerations?

I've played a bit with the options and the -mtune=barcelona does seem
to do a small difference. At the moment the code is single-threaded,
I've been trying various approaches to parallelize it but, the
algorithm being so constrained by memory bandwidth, I've yet to find a
solution that gives reasonable speedup while keeping the overhead low.
But, are there portable ways of controlling which threads run on which
cache?

Thanks again very much!

  Francesco.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]