This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.4] Strange performance regression?


Ian Lance Taylor wrote:
francesco biscani <bluescarni@gmail.com> writes:

I'm experiencing a strange behaviour with GCC 4.4.1. Basically I have
some C++ mathematical code which gets a ~x2 performance drop if I
*remove* the following debug line from the code:


This message is not appropriate for the mailing list gcc@gcc.gnu.org.
It is appropriate for gcc-help@gcc.gnu.org.  Please take any followups
to gcc-help.  Thanks.


In my experience, a performance drop in a tight loop when you remove a line of code means that your loop is extremely sensitive to cache line boundaries. It can be difficult to find the optimal code other than by testing various command line options. Options to particularly test are -falign-loops, -falign-labels, and -falign-jumps.

That seems useful advice. The align- options could help the hot loops fit Loop Stream Detector criteria. If you set -funroll-loops, you may exceed the loop size which fits LSD on older CPUs, but you would often make the LSD unnecessary.

Also, be sure that you are using a -mtune option appropriate for the processor on which you are running. E.g., you mention Core2, so you should be using -mtune=core2.
For the 64-bit compiler, the default may be better than core2, but for 32-bit you should be using at least -march=pentium-m. If you are using vectorizer, -mtune=barcelona could make a difference either way.
How are you controlling which threads run on which cache, in case there are cache sharing considerations?




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]