This is the mail archive of the
mailing list for the GCC project.
An unusual Performance approach using Synthetic registers, and a request for guidance.
- From: Andy Walker <ja_walker at earthlink dot net>
- To: "GCC Developer's List" <gcc at gcc dot gnu dot org>
- Date: Fri, 27 Dec 2002 02:27:41 -0600
- Subject: An unusual Performance approach using Synthetic registers, and a request for guidance.
I am working on a somewhat different approach for gcc for the 'x86 machines.
My assembler experience is from application development on IBM Mainframes -
the 360/370/390 stuff.
I find the thought of only 8, or really, 6 general purpose registers being
available for the compiler to be just rediculous. On the s/390 equipment, we
had 16 GP registers, and that was half what we really needed.
I am modifying gcc to add Synthetic registers. If I am mistaken, this is a
big waste of time. If I am correct, (meaning "really really Lucky") then
this may provide a significant increase in speed for executables.
The philosophy is straightforward. I believe that gcc could handle
intermediate results more efficiently. Register based values are now handled
very efficiently. Memory based values are handled more-or-less efficiently.
Because there is no specific or formal approach for intermediate results,
they are not handled efficiently. I believe that for long functions, more
than a hundred lines of code, gcc compiled code spends much of its time
recreating intermediate results again and again. This is because there is no
place to store intermediate results, other than in the registers, and so the
results are discarded and recreated.
When I was a boy, a register had a physical presence. It was as big as an
office desk, and, with power supply, weighed as much as a car. Now, it is a
"state" of the machine, and one would be hard pressed to point to the exact
spot where the register exists, no matter how good the microscope.
A Synthetic register is a Word of memory, defined to the compiler as a
register. If my readings in machine architecture are correct, then for
modern machines, L1 cache access is as fast, or nearly as fast, as register
access. Because the Synthetic registers will be frequently accessed, they
will remain in L1 cache, and give register-like performance. The compiler is
"told" that the Synthetic registers must be accessed with special
instructions. In reality, those special instructions are mostly-specified
"modrm" instructions. I am using "modrm" instructions to minimize
instruction length and to decrease complexity.
My first attempt is to add 32 general purpose Synthetic registers. The
contiguous block of Sythetic registers is aligned on whatever block boundary
is used by the L1 cache. There must be real memory somewhere from which the
L1 cache will be drawn. I am implementing it as an extra, aligned, block of
128 bytes in the frame. This means that every function will have its own
block of Synthetic registers and there will be no need to save or restore
Synthetic registers. If your call stack is 10 calls deep, you get 320
Synthetic registers. Also, a function compiled with Synthetic registers can
be linked to, and called from, a function compiled without Synthetic
registers, with no additional complexity.
Once this part is working, it may be worthwhile to look into varying the
number of Synthetic registers, depending upon the needs of the compiler for a
particular function. I am also quite curious about the value of Synthetic
floating-point registers. and have done some partial coding in anticipation
of their use.
There is an additional price for Synthetic registers. I could not figure out
a simple way to use the frame pointer, so I dragooned the "C" register,
register number 2, to be used as the base register for the modrm
instructions. Time will tell if this is Dumb, or Dumber. I am absolutely
convinced there is a better way. I just do not know yet what it is.
I have set up the allocation order so that the Synthetic registers are
allocated before the general purpose registers. This way, general purpose
registers are reserved for situations where Synthetic registers are not
Synthetic registers are an imperfect answer. They are limited to binary-type
instructions. They cannot be used as base or index registers. They cannot
be directly copied to memory. Describing the instructions to the compiler as
register instructions when in fact data is written into the L1 cache may be
too cute. I fear that the scheduler will not see the difference, and the
result will be frequent instruction stalls, something that I hoped this
approach would reduce.
So much for the Big Talk. Obviously, if this were all working now, I would
not be asking for help, instead, I would be announcing where to get a copy of
the modified code.
I have many questions, mostly of a trivial nature, and would greatly
appreciate any help.