This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
2) use typedef float myvec __attribute__ ((vector_size (16)));
In the example above, it's not only register allocation, but also scheduling. The data needs to be loaded from memory, and how that happens can affect performance quite a bit.
And yeah, I can't understand how 8.1 could get decent performance without instruction scheduling... but maybe I'm stuck in my own little RISC processing world (the (toy) compilers I have written have been for SPARC and MIPS), and I just don't understand enough about how the pentium works.
Brian
On Fri, 25 Feb 2005 14:24:27 -0500, Daniel Berlin <dberlin@dberlin.org> wrote:
On Fri, 2005-02-25 at 12:18 +0100, Brian Budge wrote:
Hmmm, I doubt that. It seems very important that your data be in registers when you want to do arithmetic on it.
That's register allocation, not scheduling :)
I can see that if your data was already in registers, maybe a "randomized" instruction ordering would perform okay, but loading the data properly is time consuming. At least these are the things I've observed.
stevenb was the source of this information for me, so maybe he can confirm it (Steven, i mentioned to brian that icc 8.1 doesn't do scheduling for the pentium4 anymore, and he doubts it :P)
-- Richard Beare, CSIRO Mathematical & Information Sciences Locked Bag 17, North Ryde, NSW 1670, Australia Phone: +61-2-93253221 (GMT+~10hrs) Fax: +61-2-93253200
Attachment:
relative_speed.pdf
Description: Adobe PDF document
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |