RFA: MIPS paired single support

James E Wilson wilson@specifixinc.com
Tue Aug 24 10:54:00 GMT 2004

On Fri, 2004-08-20 at 01:44, Richard Sandiford wrote:
> I know that you're somewhat limited by gcc's infrastructure here, but
> there's so much cut-&-paste in the patch, both in the .md patterns and
> the C code itself.  Please try to reduce it if you can.  I've made some
> specific suggestions below, but please look for other ways to cut it
> down too.

Most of the duplication is in the conditional patterns, where we have a
separate UNSPEC code for every conditional operation.  We could divide
the number of patterns by 8 or so if we can put the comparison code
inside the UNSPEC instead of encoding it into the UNSPEC number.  I will
try taking a look at this, but this will probably be the last thing I
try to work on.

> So do your vec_merge patterns accurately describe the operation for
> both big and little endian?

I believe it is currently impossible to write a vec_merge pattern that
is correct for both big and little endian.  I am willing to work on
this, as I have already worked on the other vector operators, but I
think forcing me to fix this before it goes in is an unacceptable
burden.  For the moment, the mips port is as correct as any other port,
as all ports (except maybe the SH) have broken vector support when it
comes to endianness.

> We're running out of uppercase EXTRA_CONSTRAINTS.  Perhaps it would
> be a good idea to define a two-letter constraint instead?  E.g. "YG".
> Then we can reuse the "Y" prefix for other constants (vector or
> otherwise) that might be needed in future.

Or maybe something like GV, to indicate that this is a FP zero vector
constant?  I will work on this.

> A lot of the standard V2SF patterns are obviously the same as the
> SFmode/DFmode ones, but with mechanical changes.  Perhaps it would
> be better to keep them together?  If the macro patch ever does go in,
> I'd imagine that they would be combined into a single template.

OK.  I can re-sort the simple arith/move patterns into the mips.md
file.  I noticed in your mips.md <mode> patches that you were not
requiring me to add <mode> support to the vector patch, so I am not
planning on trying at the moment.  One thing that occurs to me is that
you asked me to drop the V2SF mode attribute after version 1 of the
patch.  However, since the mode attribute now comes from <mode> that
means we either need to re-add the V2SF mode attribute, or else add yet
another define_mode_macro which will map V2SFmode to "V2SF" add all of
the other modes to themselves.

> It'd still be nice to have the vector-specific patterns in a separate
> file though.


> Minor nit: please format single-line "{ ... }" blocks like ordinary
> asm templates, as elsewhere in mips.md.

OK.  That means adding two spaces before them.  Done.

> More importantly, the type of the fourth alternative is wrong for m<-Y,
> which is a GPR store, not an FPR store.

This isn't a bug in the vector patch.  This was copied from the
movsf/movdf patterns in the mips.md file which have the exact same bug. 
I see 3 patterns in the mips.md file that are broken, 

This is however my bug.  I added it when I split store into store and
fpstore.  Maybe I can fix this with a separate patch?  I don't want to
make the vector patch any bigger than it already is.

> + (define_insn "*ldxc1v2sf"
> +   [(set_attr "type"	"fpidxload")
> +    (set_attr "mode"	"SF")
> +    (set_attr "length"   "4")])
> Strange formatting of set_attrs.  I'd like to move away from having
> tabulated attributes, so I'd be happy with a single space between
> the name and value.

Again, this was just copied from the mips.md file.  If you look at the
existing ldxc1 patterns before your <mode> patch, you will see that they
have the exact same strange formatting of set attrs.  It just wasn't
obvious until I made a diff.  So I wouldn't really call this a bug with
the vector mode patch.

It is easy enough to fix all of the attributes in the mips-ps-3d.md file
to follow the new scheme though.  Done.

> If you're going to give descriptive names to the the patterns (which I
> agree is a good thing), please do it for both the Pmode == SImode and
> Pmode == DImode versions, not just the former.

This is a style issue.  If a pattern doesn't have a name, then the gen*
programs will create one by using the name of the previous pattern with
a name, and add an offset.  Thus the SImode ldxc1 pattern is "ldxc1v2sf"
and the DImode one is "ldxc1v2sf+1".  My goal here wasn't to add a name
to every pattern, but just to add enough names so that every pattern
would end up with a sensible looking name.  This makes the rtl output
more readable, and also makes the ChangeLog entries more readable.  Thus
there is no need for a name on the DImode ldxc1v2sf pattern, as
ldxc1c2sf+1 is a perfectly fine name for it.

If you want separate names, then I can certainly do that.  Done.

> + 	 (match_operator 4 "equality_operator"
> No mode on the match_operator.  Similarly elsewhere.


> + (define_insn "mips_pul_ps"
> +   [(set_attr "type"	"fadd")
> Why fmadd?  This seems more like an fmove insn to me.  Same with
> the vecinit patterns, among others.

I used fmove in my patch.  MIPS used fadd in their patch.  I don't know
why, but it didn't seem important.  At the moment, the only target for
which I have looked at timing info is SB-1, and for that target, it
doesn't make a difference whether we use fmove or fadd.  The timing is
the same.  So at this point, I think either choice is arbitrary, and can
be fixed later if necessary when we have more info.

For the moment, I am willing to use the arbitrary choice that mips
used.  If you want a different arbitrary choice, let me know.

> +   if (INTVAL (operands[2]) != 0 && INTVAL (operands[2]) != 1)
> +     FAIL;
> The overhead of defining predicates is much less after Zack's patch.
> I'd prefer a "const_0_or_1_operand", or whatever, rather than repeated
> checks against INTVAL.


Part of the reason why I used explicit INTVAL checks is that this avoids
the need to create yet another constraint letter.  We have a constraint
letter for zero, but not for 1.  I see the uses of const_1_operand in
the mips.md file use empty constraint strings.  I will try that.

> + (define_expand "vec_initv2sf"
> + (define_insn "vecinitv2sf"
> I don't like the choice of names here (just removing a "_" for the
> internal pattern).  I was a bit confused when I got to the cvt_ps_s
> expander and saw a "vec init" pattern being passed three arguments.
> Please use _internal or something.


> + ;; ??? This is only generated if we perform a vector operation that has to be
> + ;; emulated.  There is no other way to get a vector mode bitfield extract
> + ;; currently.
> Don't understand the comment.  Could you provide a few more details?

Jan Hubicka added vec_init/vec_set/vec_extract patterns in January. 
While a good idea, he didn't actually add good testcases for them, or
much support for generating them.  vec_init is the easy one.  If you
  a = (v2sf) { a, b};
it expands to a vec_init call.  vec_extract and vec_set are a bit more
compliated.  Jan added code to extract_bit_field and store_bit_field to
call vec_extract/vec_set as appropriate if we have a vector mode
bitfield operation.  Unfortunately, we have no C language syntax for
performing an extract/insert on a vector.  The only way to access a part
of a vector other than a builtin is to put it in a union along side an
array.  Once you do that, you no longer have a vector mode, you have
BLKmode.  And of course, if you do use a builtin, then there is no point
in calling extract_bit_field/store_bit_field.  vec_extract and vec_set
seemed hopelessly useless for a while, then I discovered that the
compiler would call vec_extract if it had to emulate a vector
operation.  For example, on a non-SB-1 mips, there is no V2SF divide, so
we emulate it by doing two extracts, two divides, and then constructing
a new vector.  The two extracts are performed via vec_extract if such a
pattern exists.  Constructing a new vector is performed via vec_init if
you have it, otherwise, we do two insertions, which will be performed
via vec_set.  Since any reasonable target will have a vec_init pattern,
the vec_set named pattern appears to be useless currently, but we can
verify that it works by temporarily disabling the vec_init pattern. 
Someday it may be useful, so we want to define it anyways.

> Similarly here.  Given that we _don't_ disable the vec_init pattern,
> does the comment mean that we never actually use vec_setv2sf?  If so,
> please don't add it ;)

But someday someone will improve gcc to the point where vec_set does get
used, and then the mips port will be broken and we will never know it. 
It is better to define it now, while we know it is needed, rather than
to wait until we have an invisible bug.  Defining it does no harm, and
will ensure that we benefit from future gcc improvements.

> Again, the choice of "fadd" seems a bit odd.  Why not fcvt?

Again, as above, I used fcvt in my patch, MIPS used fadd in their patch,
and I didn't care enough to change it.  Either choice seems completely
arbitrary based on available info, as I only have timing info for SB-1,
and fmove/fcvt/fadd all take exactly the same amount of time, so it
doesn't matter which one gets used.

> Wouldn't it be better to use a define_insn_and_split that creates two
> mips_c_f_ps patterns after reload?  It should allow better scheduling
> and (much more minor) means you don't need to add the '%V' or '%v'
> operand formats.

FYI this is MIPS code not my code.

I looked at this, and didn't see a convincing reason to split them. 
This pattern is added for an obscure reason.  The MIPS-3D ASE has a
"branch on 4 FP condition codes" instruction, but there is no
instruction that sets 4 FP condition codes.  So we cheat and add 4-wide
compare patterns that expand to 2 ps instructions.  I was concerned
about creating additional patterns for something that seemed obscure.  I
was also concerned about getting the register allocation right.  The
split after reload trick didn't occur to me at the time, but I didn't
really spend much time looking at this.

Maybe I can look at fixing this after the patch gets accepted?

> Please use the same UNSPEC numbers for both the 2- and 4-CC versions,
> both here and in the cabs patterns.  This should make it easier to
> combine everything later.

OK.  This is on my list.

> + ;----------------------------------------------------------------------------
> + ; Branch on Upper of Two Floating Point Condition Codes True
> + ;----------------------------------------------------------------------------
> I don't mind seeing this style of "----" comment for categories,
> but please don't use it for individual patterns.

FYI this is MIPS code not mine.

I removed the --- comments from mips.md since I thought you would
complain about them on style reasons, but I thought it would be OK to
leave them in mips-ps-3d.md as there was no pre-existing style there. 
Apparently I was wrong, but you are only complaining about a few of them
near the end of the file, so this look easy to fix.  Hopefully this will
be OK in the next patch version.

> + (define_insn "mips_recip1_s"
> Wouldn't the new frdiv be more appropriate than frsqrt?

These are actually very different instructions from the reciprocal
divide instructions.  These are reciprocal divide step.  I would argue
that frdiv is just as wrong as frsqrt, but either way, it is a pretty
arbitrary choice at this point.  For SB-1, recip divide is 12 cycles,
but recip divide step 1 is 4 cycles and recip divide step 2 is 8 cycles,
so we actually need two new type attributes to get this right.  I didn't
think this extra complication needed to be part of part of the initial
vector patch, which is already too big and too complicated.

Similarly, uses of frsqrt is wrong for the reciprocal square root step

But since this is all arbitrary at the moment, I don't really care.  I
change the recip divide step ones to use frdiv.

> + /* MIPS builtin functions */
> No need for these comments.

OK.  Gone.

> + #define TARGET_INIT_BUILTINS mips_init_builtins
> + 
> No need for the gap.

OK.  Gone.

> ! 	info->fpr_p = (named && (type == 0 || FLOAT_TYPE_P (type)
> ! 				 || TYPE_MODE (type) == V2SFmode));
> are exactly what I'd like to avoid.  Please make the o32/o64/n32/n64
> cases depend on TYPE where given.

I find this ambiguous.  Exactly what are you trying to avoid here?  This
is depending on the TYPE, specifically, it is depending on TYPE_MODE
(type).  If you are objecting to the use of a mode check, then I don't
understand that either, because the new code also has mode checks.  In
any case, this gets rewritten not only because of the mips ABI patch but
also because of my tree.h patch to fix folding of vector floats. 
FLOAT_TYPE_P now includes vector types, so the check above is now
unnecessary.  Also, we have a new VECTOR_FLOAT_TYPE_P that I can use
instead of the TYPE_MODE check.  That should address your concerns.

> Watch the "! "s.  Looks good otherwise.

I was watching them, but old habits die hard.  These two were fixed.

> Don't like this formulation much.  Why not just:
> 	  if (mode == CCV2mode)
> 	    temp = (ISA_HAS_8CC
>                     && ST_REG_P (regno)
>                     && (regno - ST_REG_FIRST) % 2 == 0);

OK.  Fixed.  Likewise for CCV4mode.

> +   else if (letter == 'Y')
> +     {
> +       register int regnum;
> +     }
> For the love of God, please don't add new instances of "register" ;)
> This is one of the cut-&-paste bits I was complaining about above.

Well, the "register" was there in the original code...

I could pull out these 6 sections of code into a small subroutine. 
Would that be OK.  This will make the code a little slower because of
extra overhead, but will eliminate the duplication.

> !   /* The only modes allowed in FP status regs are condition code modes, and
> !      CCmode is always considered to be 4 bytes wide.  */
> Please reconcile this new comment with the existing one.
> Don't you need to change mips_class_max_nregs as well?

Fixed the comment.  I was thinking I didn't need a mips_class_max_nregs
change because I gave the CCmodes sizes, but I see I got that wrong. 

> Comments above the fields please.  And there's no need to repeat "MIPS"
> all the time. ;)


> These comments also aren't that helpful.  What's the "lowest class
> target flag"?  (Yes, it's obvious when you read the rest of the code,
> but it's hardly obvious here.)

I just deleted "lowest class".  I think the comment makes sense without
that phrase.

> +   { CODE_FOR_mips_pll_ps, "__builtin_mips_pll_ps", MIPS_BUILTIN_PLL_PS, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE },
> Please break up the long lines.


> Do we really need the MIPS_BUILTIN_ enum?  Why not just use the mips_bdesc[]
> index as the builtin number, use that to look up the CODE_FOR_*, and key off
> the CODE_FOR_* instead?
> If that isn't possible for some reason, how about putting the definitions
> in a *.def file that can be read both here and in mips.h?  Then there will
> be less chance of them getting out of sync.

This is more code from MIPS, not from me.  It looks like the point of
keeping them in the same order is to avoid the search loops that other
ports use.  See for instance the rs6000_expand_builtin function which
does a linear search of the bdesc table to find the right entry.  The
mips port just does a table lookup.  If you really want to avoid the
ordering problem, them I will add in the loops to do a linear search.  I
think the .def file idea is unnecessary busy work.

> + rtx
> + mips_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
> + 		     enum machine_mode mode ATTRIBUTE_UNUSED,
> + 		     int ignore ATTRIBUTE_UNUSED)
> Say specifically which target macro you're implementing.  More "! "s here.

I assume you mean you want TARGET_EXPAND_BUILTIN mentioned in the
comment.  I fixed more "! ".

> Please use a loop to process this part:
> and reuse it for the one- and three-operand functions too.

Added to my list.

> Rather than have such a humungous switch statement, I'd prefer to see
> the expansion depend on ftype, even if that means splitting ftype up
> a bit.

Added to my list.  This probably requires adding lots of linear search
loops such as is done in the rs6000 file.

> + /* Expand conditional move builtin functions.  */
> Comment the arguments and return type please.  Please also consider
> reusing the suggested loop in expand_builtins and passing the processed
> data to this subroutine.  Same for the other subroutines.

Added to my list.

>         if (TARGET_SINGLE_FLOAT)					\
> ! 	builtin_define ("__mips_single_float");			\
> !       else if (TARGET_PAIRED_SINGLE_FLOAT)			\
> ! 	builtin_define ("__mips_paired_single_float");		\
> No need for the "else".

I added the else because it makes the code marginally faster.  It isn't
possible for both to be true, so there is no point in testing
But since you don't like it, it is gone.

> but there's no way for paired single operations to be turned off.

An oversight.  I added -mno-paired-single -mno-mips3d options and
updated the docs.

> +   MIPS_V2SF_FTYPE_V2SF,                 /* v2sf func (v2sf);  */
> Redundant comments.


> + @itemx -mpaired-single
> + @opindex mpaired-single
> + Enable support for the paired single instructions.
> Not very helpful as documentation goes ;)  Please at least say what
> the user can expect wrt the vector extensions.  There'll also need
> to be some documentation of the builtin functions themselves.

There is a separate section of the gcc manual that documents the vector
support.  This is in the extend.texi file.

MIPS already has documentation on the builtin functions.  I think it is
a reasonable convention to rely on the processor vendor to specify these
things.  This is what we do with x86 builtin functions for instance.

> + @itemx -mips3d
> + @opindex mips3d
> + Enalble support for the MIPS-3D ASE.  This implies @option{-mpaired-single}.
> Typo.

Fixed.  Actually, since I added the -mno option, and I noticed that the
mips16 option was worded differently, I copied the wording from the
mips16 option.
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

More information about the Gcc-patches mailing list