This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

An unusual Performance approach using Synthetic registers, and a request for guidance.


I am working on a somewhat different approach for gcc for the 'x86 machines.  
My assembler experience is from application development on IBM Mainframes - 
the 360/370/390 stuff.  

I find the thought of only 8, or really, 6 general purpose registers being 
available for the compiler to be just rediculous.  On the s/390 equipment, we 
had 16 GP registers, and that was half what we really needed.  

I am modifying gcc to add Synthetic registers.  If I am mistaken, this is a 
big waste of time.  If I am correct, (meaning "really really Lucky") then 
this may provide a significant increase in speed for executables.

The philosophy is straightforward.  I believe that gcc could handle 
intermediate results more efficiently.  Register based values are now handled 
very efficiently.  Memory based values are handled more-or-less efficiently.  
Because there is no specific or formal approach for intermediate results, 
they are not handled efficiently.  I believe that for long functions, more 
than a hundred lines of code, gcc compiled code spends much of its time 
recreating intermediate results again and again.  This is because there is no 
place to store intermediate results, other than in the registers, and so the 
results are discarded and recreated.  

When I was a boy, a register had a physical presence.  It was as big as an 
office desk, and, with power supply, weighed as much as a car.  Now, it is a 
"state" of the machine, and one would be hard pressed to point to the exact 
spot where the register exists, no matter how good the microscope.  

A Synthetic register is a Word of memory, defined to the compiler as a 
register.  If my readings in machine architecture are correct, then for 
modern machines, L1 cache access is as fast, or nearly as fast, as register 
access.  Because the Synthetic registers will be frequently accessed, they 
will remain in L1 cache, and give register-like performance.  The compiler is 
"told" that the Synthetic registers must be accessed with special 
instructions.  In reality, those special instructions are mostly-specified 
"modrm" instructions.  I am using "modrm" instructions to minimize 
instruction length and to decrease complexity. 

My first attempt is to add 32 general purpose Synthetic registers.  The 
contiguous block of Sythetic registers is aligned on whatever block boundary 
is used by the L1 cache.  There must be real memory somewhere from which the 
L1 cache will be drawn.  I am implementing it as an extra, aligned, block of 
128 bytes in the frame.  This means that every function will have its own 
block of Synthetic registers and there will be no need to save or restore 
Synthetic registers.  If your call stack is 10 calls deep, you get 320 
Synthetic registers.  Also, a function compiled with Synthetic registers can 
be linked to, and called from, a function compiled without Synthetic 
registers, with no additional complexity.

Once this part is working, it may be worthwhile to look into varying the 
number of Synthetic registers, depending upon the needs of the compiler for a 
particular function.  I am also quite curious about the value of Synthetic 
floating-point registers. and have done some partial coding in anticipation 
of their use.

There is an additional price for Synthetic registers.  I could not figure out 
a simple way to use the frame pointer, so I dragooned the "C" register, 
register number 2, to be used as the base register for the modrm 
instructions.  Time will tell if this is Dumb, or Dumber.  I am absolutely 
convinced there is a better way.  I just do not know yet what it is. 

I have set up the allocation order so that the Synthetic registers are 
allocated before the general purpose registers.  This way, general purpose 
registers are reserved for situations where Synthetic registers are not 
appropriate.

Synthetic registers are an imperfect answer.  They are limited to binary-type 
instructions.  They cannot be used as base or index registers.  They cannot 
be directly copied to memory.  Describing the instructions to the compiler as 
register instructions when in fact data is written into the L1 cache may be 
too cute.  I fear that the scheduler will not see the difference, and the 
result will be frequent instruction stalls, something that I hoped this 
approach would reduce.

So much for the Big Talk.  Obviously, if this were all working now, I would 
not be asking for help, instead, I would be announcing where to get a copy of 
the modified code.  

I have many questions, mostly of a trivial nature, and would greatly 
appreciate any help.

		Andrew Walker


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]