This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: modification to inline asm + 128 bit cop2 register support
- To: dylan_cuthbert at hotmail dot com, gcc at gcc dot gnu dot org
- Subject: Re: modification to inline asm + 128 bit cop2 register support
- From: Mike Stump <mrs at windriver dot com>
- Date: Mon, 12 Jun 2000 14:29:48 -0700 (PDT)
> From: "Dylan Cuthbert" <dylan_cuthbert@hotmail.com>
> Date: Sat, 10 Jun 2000 22:00:24 JST
> I'll reply in three parts - the first part argues the case for my patch with
> some undeniable logic (IMHO)
Disagree.
> 1. my case for "%A" to determine which_alternative in asm statements:
> Currently, the gcc inline "asm" instruction supports "alternative"
> constraints for the operands supplied which is a great idea to help the
> compiler optimize for commands it can't even see.
It is great, except better ways exist.
> Unfortunately, this rather useful feature is fundamentally flawed in
> its current implementation.
Sounds like you're arguing my point. While I could include this as
fodder on my side, I don't think my side is weak enough to add this point.
> I think we need to give inline assembler programmers (who have a
> hard enough time as it is) the ability to use the fastest
> "alternative" the compiler has calculated it can provide, regardless
> of what processor they are working with.
Agreed. Though, I think better ways exist.
> 2. problem #1:
> Here's my situation:
> I am programming the Toshiba R5900 which is a 128 bit dual-issue
> processor. Cygnus have supplied a fairly good machine description
> that supplies a simple 128-bit TI type that can only be
> loaded/stored or copied.
Would be great if Toshiba contributed improvements to gcc that allowed
gcc to take more advantage of their processor. If they don't,
performance on Toshiba processors will hurt. This is, in part, between
you and your vendor (Toshiba/Cygnus).
> The 128 bit registers are generally used in 64 bit mode for regular
> math/operations which allows them to be dual-issued to get twice the
> thru-put. Therefore for regular use the compiler is running in 64
> bit mode.
main() {
__simd128_t a, b, c;
__simd_2_64bit_mul (a, b, c);
}
You then have the compiler register allocator, allocate the registers,
and have the __ builtin forward out to something in the md file.
Relatively simple, not too hard to get working, and we can extend the
compiler out in natural ways to auto vectorize, later. In the shorter
time frame, you allow users to use these builtins to _get at_ the
features of the machine. Porting is even easier, as one can redefine
these builtins to forward out to plain C code to emulate the
instructions. Also, as time goes on, and enough machines start doing
this, we can unify and merge like things together, and make them even
more portable.
> However, the processor has a whole ton of "extra" instructions that
> operate on the 128 bit registers in numerous irregular ways.
Each one become a separate builtin.
> Ways that, as far as I can see, would take several years to get the
> compiler to use and optimize properly (if possible at all).
Yes, that is true, in the general sense, but no, in the shorter term,
which is what I'm talking about, one maps directly from what user
said, into a builtin, and directly from the builtin to a line in the
md file, and directly into the specific asm instruction.
So, while the general scheme might take a long (too long a time), my
scheme is much less aggressive, and far easier to implement. Also, I
describe not just my opinion, but also how the compiler has been
extended before in practice, though, the code isn't in the main gcc
tree yet. In the longer run, would be nice to have:
main() {
long long a1, b1, c1;
long long a2, b2, c2;
a1 = b1 * c1;
a2 = b2 * c2;
}
map directly into the above code, but this is _much_ harder.
> AFAICS, there is no basic type in C/C++ to allow the use of these extra
> instructions in any way whatsoever. We *have* to use inline asm.
Doesn't follow. The assumption is that a port cannot add new register
classes, nor can add new builtins. I'd like to suggest this is false.
> 3. co-processor 2 registers
> There is a co-processor with 32 128-bit registers. For reasons similar to
> the above problem, the actual operations on these registers are impossible
> for the compiler to generate, (without making a major modification to the
> C++ iso standard!).
Again, wrong. compiler builtins don't have to be added to the C++
language standard, ot be added to g++.
> However, the compiler could at least help me with register
> allocation:
Agreed.
> This would produce very efficient code if I had the know-how to get it
> working.
You can always learn, or pay some else to learn/do it.