This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: G++ could optimize ASM code more

Hello and thanks for your quick reply!

Am 09.05.2012 15:59, schrieb Ian Lance Taylor:

Note that the current GCC release is 4.7.0.

The problem with Debian Squeeze is always that I have to use "medieval" software... ;-) Maybe I should develop the server software on a local box using "unstable" software. On the other hand, if I develop directly at the production machine, I can directly optimize the program for the machine itself and not for my local box/CPU.

This cast changes the meaning of the code, so it's not surprising that
you see different assembler instructions. The first case above will do
the multiplication in the type "unsigned long long". In the second case
the "unsigned char" values are zero-extended to int, and the
multiplication is done in the type "int". Then the "int" result is
sign-extended to "unsigned long long" for the addition.

In this case it's true that the compiler could convert the code as you
suggest, based on the knowledge that the int values are always in the
range 0 to 255.

I did understand that the compiler used "signed" multiplication instead of an unsigned one because char*char needs to be extended.

Maybe I am wrong, but couldn't the compiler "know" that the result will be at least unsigned because unsigned * unsigned = unsigned ?

So it could have extended the multiplication to the unsigned long-long datatype of c or at least just "unsigned int" instead of "signed int"?

However, it's not clear to me that using imulq would be
better.  My copy of the Intel optimization manual suggests that imull
has slightly lower latency than imulq, so I think that in many cases
imull would be preferred.

Mh... good point. I do not know much about Assembler so I just thought the shorter the code the better. If imull is faster than imulq, then the question is, if imull+movslq is still faster than a single imulq. Do you know where I can find these informations for my CPU (Intel Xeon X3440)? I was searching for a table which shows how many CPU-ticks the imull, imulq and movslq need, but yet I have not found one.

My Linux is 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux .

And the CPU is "Intel(R) Xeon(R) CPU X3440 @ 2.53GHz". (I hope the "amd64" version of Debian is the correct one, or should our admin have installed the "ia64" variant since it is an Intel CPU?)

Best regards
Daniel Marschall

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]