This is the mail archive of the
mailing list for the GCC project.
Re: G++ could optimize ASM code more
Hello and thanks for your quick reply!
Am 09.05.2012 15:59, schrieb Ian Lance Taylor:
Note that the current GCC release is 4.7.0.
The problem with Debian Squeeze is always that I have to use "medieval"
software... ;-) Maybe I should develop the server software on a local
box using "unstable" software. On the other hand, if I develop directly
at the production machine, I can directly optimize the program for the
machine itself and not for my local box/CPU.
This cast changes the meaning of the code, so it's not surprising
you see different assembler instructions. The first case above will
the multiplication in the type "unsigned long long". In the second
the "unsigned char" values are zero-extended to int, and the
multiplication is done in the type "int". Then the "int" result is
sign-extended to "unsigned long long" for the addition.
In this case it's true that the compiler could convert the code as
suggest, based on the knowledge that the int values are always in the
range 0 to 255.
I did understand that the compiler used "signed" multiplication instead
of an unsigned one because char*char needs to be extended.
Maybe I am wrong, but couldn't the compiler "know" that the result will
be at least unsigned because unsigned * unsigned = unsigned ?
So it could have extended the multiplication to the unsigned long-long
datatype of c or at least just "unsigned int" instead of "signed int"?
However, it's not clear to me that using imulq would be
better. My copy of the Intel optimization manual suggests that imull
has slightly lower latency than imulq, so I think that in many cases
imull would be preferred.
Mh... good point. I do not know much about Assembler so I just thought
the shorter the code the better. If imull is faster than imulq, then the
question is, if imull+movslq is still faster than a single imulq. Do you
know where I can find these informations for my CPU (Intel Xeon X3440)?
I was searching for a table which shows how many CPU-ticks the imull,
imulq and movslq need, but yet I have not found one.
My Linux is 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64
And the CPU is "Intel(R) Xeon(R) CPU X3440 @ 2.53GHz". (I hope the
"amd64" version of Debian is the correct one, or should our admin have
installed the "ia64" variant since it is an Intel CPU?)