This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: modification to inline asm + 128 bit cop2 register support


(replying to Mike Stump from my home account)
(this mail is really really way too long! sorry!)

I'll reply in three parts - the first part argues the case for my patch with 
some undeniable logic (IMHO).  The second and third describe two problems 
with the current compiler and the very latest chips that are coming 
available.  The third situation would really be improved if someone would 
help me with those darned machine description files.

>Mike Stump wrote:
>This goes in the wrong direction I feel.  Instead, add support to
>expose more of the assembly to the compiler, and have the compiler
>generate the `more optimal' code.  I think you'll win better that way.
>You can then have the full power of the optimizer to optimize.

1. my case for "%A" to determine which_alternative in asm statements:

Currently, the gcc inline "asm" instruction supports "alternative" 
constraints for the operands supplied which is a great idea to help the 
compiler optimize for commands it can't even see.

Unfortunately, this rather useful feature is fundamentally flawed in its 
current implementation.  This part of asm's *functionality* is 
"machine-dependent", in fact, it is worse than that!  It is assembler 
mnemonic dependent!  So much so, that this very nice feature is *completely* 
unusable for MIPS processors as far as I can see.

It relies too much on the assembler's mnemonic format being able to take 
arguments of different types.  This is way too arbitrary for such a generic 
and otherwise multi-platform compiler such as gcc. (IMHO)

I think we need to give inline assembler programmers (who have a hard enough 
time as it is) the ability to use the fastest "alternative" the compiler has 
calculated it can provide, regardless of what processor they are working 
with.

>
>If you want a concrete example, show me what you wanted to do (and
>annotate it some so I can grasp what you want and why it is better.

2. problem #1:

Here's my situation:

I am programming the Toshiba R5900 which is a 128 bit dual-issue processor.  
Cygnus have supplied a fairly good machine description that supplies a 
simple 128-bit TI type that can only be loaded/stored or copied.

The 128 bit registers are generally used in 64 bit mode for regular 
math/operations which allows them to be dual-issued to get twice the 
thru-put.  Therefore for regular use the compiler is running in 64 bit mode.

However, the processor has a whole ton of "extra" instructions that operate 
on the 128 bit registers in numerous irregular ways.  Ways that, as far as I 
can see, would take several years to get the compiler to use and optimize 
properly (if possible at all).  For example, swapping bits 32-63 with bits 
64-95, or adding every 8 bits of one register with every 8 bits of another 
register etc...  (for more info, the spec for the R5900 is available from 
toshiba.)

AFAICS, there is no basic type in C/C++ to allow the use of these extra 
instructions in any way whatsoever.  We *have* to use inline asm.

I don't see any solution to the above problem, however, I have an additional 
problem which the compiler could at least help me with:

3. co-processor 2 registers

There is a co-processor with 32 128-bit registers.  For reasons similar to 
the above problem, the actual operations on these registers are impossible 
for the compiler to generate, (without making a major modification to the 
C++ iso standard!).  However, the compiler could at least help me with 
register allocation:

Currently, I have to pass 128 bit values through the main core's 128 bit 
registers, execute the COP2 instruction and then pass the values back even 
if the value is being used again by the COP2 in the very next instruction.

eg.

extern inline PerformCop2Insn( TItype value )
{
  asm
  (
    "move to cop2"
    "execute cop2 insn %0"
    "move to core"
    : "+r" (value)
  );
}

int main(...)
{
  TItype value;
  PerformCop2Insn( value );
  PerformCop2Insn( value );
}

This is incredibly un-optimal as you can probably see from the example - (it 
is even more unoptimal when I don't have a %A code to determine whether the 
registers need to be written to memory or a register)

If the compiler could allocate the COP2 registers for me and even supply the 
relevant load/store/move command for the input and output I could write 
"PerformCop2Insn" simply as:

extern inline PerformCop2Insn( TFtype value )
{
  asm
  (
    "execute cop2 insn %0"
  : "=v" (value)   // v is 128 bit co-processor2 register class
  );
}

This would produce very efficient code if I had the know-how to get it 
working.

I would also need inter-assignability between 128 bit core registers and 
cop2 registers.

As an additional complication: (not totally necessary)
Because the co-processor *also* has a "micro" operational mode (where it 
executes programs internal to itself and hence can use all/any of its 
registers), I need to be able to switch the registers available to the 
compiler for allocation on-the-fly between functions. (maybe with a function 
attribute?)

Apologies for the rather long message,

Best Regards

Dylan Cuthbert
(All views and opinions are mine and mine only, etc etc)
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]