This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Some optimization thoughts (and thanks!)


Here are some optimization suggestions:

* Inlining of all functions with a single caller: This would presumably
 be done after short-function inlining (to allow wrapper functions to
 be inlined into their callers, such wrappers are likely have condition
 checks and such which can be simplified if inlined into the more basic
 caller; but if the function that a wrapper calls is inlined first, the
 resulting function may be too large to inline for multiple basic
 callers).  This removes the performance penalty of breaking code into
 graspable portions.  Unlike regular inlining, this reduces code size
 (by removing call/return overhead).  This should not be that difficult
 to implement, nor add much to compile time, particularly if the
 call-tree for a program is retained between compilings and partially
 updated after source modification.

* Consider moving static condition check outside of simple loop.
 Although branch prediction hardware would correctly predict the
 direction of the branch (possibly after one or two warm-up cycles),
 for a simple loop, the extra instruction could be an excessive extra
 cost.  This would bloat code somewhat, but might be a notable win
 for simple enough, long loops.  (E.g., "for (x = 0; x < limit; x++)
 { if (direction==forward) { A[x] = A[x]<<1; } else
 { A[x] = A[x]>>1;} }" --OR-- "for (x = 0; x < limit; x++) { if
 ( (test1==true) && (A[x] > min_A) ) { A[x]--; } }")

* Provide vector-like operations using standard registers, if conditions
 allow such.  (E.g., summing two arrays of 16-bit unsigned integers can
 be done with in pairs with 32-bit registers if overflow is known not
 to occur.)  64-bit machines would seem to offer even more benefit in
 this, allowing a pair of 16-bit multiply and divide by a constant at
 the cost of some shifting and logical operations.  If gcc will include
 support for detecting when vector-like operations can be done, it
 might be useful to provide this optimization for chips that do not
 have special vector instructions.

* When branches are expensive, it might be reasonable to translate if
 statements with multiple conditions into merged condition and a single
 branch rather than multiple branches.  E.g., "if ((A==N) && (B==M))"
 might be translated into "XOR Temp1 A N; XOR Temp2 B M; OR Temp1 Temp2
 Temp1; BNEZ Temp1 L3; #branch over conditional code" rather than "XOR
 Temp1 A N; BNEZ Temp1 L3; #branch over second condition test and
 conditional code\\ XOR Temp1 B M; BNEZ Temp1 L3;"  Removing a branch
 might increase modality of the branch and decrease branch history
 table aliasing, improving branch prediction.  Such prediction
 improvement might compensate for the overhead of performing both tests
 in all cases.  (This is what happened in a test case using a pair of
 comparisons of a random number and a constant, at least performance
 improved on an x86-based system.)

(I am disappointed that gcc does not surpass all other compilers in
optimization--gcc should be the best in everything (portability,
optimization, compile speed, standards conformance, etc.).  It is
particularly disappointing that gcc seems to miss some common
optimizations (e.g., loop arrangement to optimize memory access
patterns).  I am, of course, very happy that gcc is Free--a compiler is
a system component even if some vendors do not think so.)

Thank you for providing gcc!

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]