1.02 g77 link problem
Mon Apr 13 11:14:00 GMT 1998
[ egcs-1.0.2 compiled code for Fortran complex program slower than
The hotspot of your program is:
do m = m1,m2
z1=z1 + x(m,l)*y(m,k)
with z1, x and y DOUBLE COMPLEX entities.
As I pointed out before, the generated code contains lots of
temporary variables due to the expansion of the complex product.
However, using the -fno-emulate-complex flag on the compile line,
the generated code looks much better (this is for my
i.e., it has one fmoved an@,fpm (double precision load) for each of
the four parts of the two double complex array elements involved -
and nothing more: Everything else is just address updating of the
arrays and of course, the floating point computations itself.
This is the result as far as timing and correctness is concerned:
So that's at least twice as fast, and the answer is correct.
Why does g77 default to -femulate-complex (and what is it) ? :
When g77 was first released to the general public (19950217), it
handled (double) complex variables and arrays by just telling the
gcc backend (loosely: the code generator): These are (double)
complex entities (in gcc-speak: SCmode and DCmode), generate code
Unfortunately, during the two years following the initial release,
it turned out that the backend didn't treat the complex type very
well. At first we hoped that it would only affect complex int,
short [which is of no concern to Fortran code] etc., but it turned
out that also the treatment of complex float and complex double was
broken in enough cases to warrant another approach.
So Craig Burley converted all places in the Fortran Frontend that
handed over complex arithmetic to the backend to explicitly spell
out the resulting real arithmetic (so instead of saying: here are
two complex numbers (a,b) and (c,d); multiply them, it said: here
are two sets of two real numbers, to be multiplied as follows: (ac -
bd, ad + bc).
The last form is the default since g77-0.5.20 (19970301).
Apparently, when the backend doesn't make an error in dealing with
complex entities directly, it is able to generate more efficient
code - it is not clear to me why this is so.
So you might try to use the flag -fno-emulate-complex, BUT ONLY IF
YOU CAN CHECK THE ANSWER OF YOUR COMPUTATIONS !!!
More information about the Gcc-bugs