[PATCH 2/3] MIPS: add builtime option for -mcompact-branches

YunQiang Su syq@debian.org
Thu Mar 4 03:24:08 GMT 2021


Maciej W. Rozycki <macro@orcam.me.uk> 于2021年3月4日周四 上午7:50写道:
>
> On Fri, 19 Feb 2021, YunQiang Su wrote:
>
> > >  My understanding therefore is that the original assumption that `optimal'
> > > will serve people best is no longer true.
> > >
> >
> > I guess that `optimal' can still produce the best performance, while
> > the delay slot
> > make MIPS quite differnent with other architectures.
> > And the hardware engineers seems hate it also.
>
>  Right, but what does it have to do with compiler defaults?  Given what we
> have available in hardware we want the best results possible, except for
> research or special use cases (such as GAS's `-minsn32' option with
> microMIPS code).  I would like to understand what the use case is here.
>

I want to give the hardware designer the possibility to remove the
delay slot branch
instructions in future (maybe in a long term).

> > And we expect that MIPS can have as few as possible differnece delta
> > with other major architectures,
> > to ultily all of new framworks of community.
>
>  Well, machine code is inherently architecture-specific, so you can't
> have a single template that fits all.  The difference betwen processor
> architectures is more than just the bit patterns for individual opcodes
> and operand encodings (and the corresponding mnemonics and syntax for
> the assembly language).
>

Delay slot makes some troubles for newly written binary tools.
Lots of people are not used to delay slot, and then they may not
consider MIPS in some new work.
We cannot expect to much the support from them, so we have to reduce the delta.

>  For example one of the major architectures is ARM, which has conditions
> encoded with all the instructions.  And you cant mimic it with other ISAs.
> Similarly Power has 8 sets of condition codes and dedicated instructions
> to make ALU operations between these codes.  You can't do those elsewhere
> either.  Well, the MIPS or RISC-V ISAs do not have condition codes at all.
> And x86 is not a load-store architecture at all, so you'll see operations
> made directly on memory, as a destination even (let's ignore the even more
> arcane original 32-bit instruction set).
>
>  These are all considered major architectures nowadays.

Yes. "nowadays".
How about the future?
Since the MIPS is the only architecture which has delay slot, will it
be ”no-major“?

>
>  So we have got the MIPS ISA and its delay slots.  Some subsets/variations
> of the ISA have already either greatly reduced their use or eliminated
> them completely, but we went into great lengths with GCC to produce good
> code making use of these delay slots, so I think it would be a shame to

Yes. There will not be a  problem for GCC itself, while there are more and more
other tools, open source or commercial.
Since MIPS is not the de-facto majority, it is not so easy to ask them
to support MIPS.

> get this effort wasted on one hand, and MIPS code put at a disadvantage
> due to cycles wasted for pipeline stalls that could be avoided if delay
> slots were scheduled -- on the other.
>

I have some tests on I6500. The performance of delay-branch and compact-branch
are almost the same. It will not be a performance regression.

> > >  First, I think it would be good if we knew why.  I find proliferating
> > > variants of defaults, especially for the less obvious cases, will cause
> > > user confusion as one won't know what code model to expect, especially as
> > > (please correct me if I am wrong) we don't actually provide a way to dump
> > > the compiler's overridable configuration defaults.
> > >
> >
> > So, should we provide a predefined macro for it?
>
>  I've been thinking more along `gcc -v --version' dumping the invocation
> of `configure' used, but I have to correct myself here in that it already
> happens, so nothing to do.  I'm not sure why I forgot it and/or could not
> have figured it out previously.  Sorry about the confusion.
>

nop.

> > >  Second, I wonder if it makes sense to just keep things simple, and rather
> > > than adding `prefer' (to stand for "`always' if available"), and possibly
> > > `avoid' (to stand for "`never' if available"), whether we shouldn't just
> > > relax the checks for `always' and `never', and let them through whether
> > > the architecture selected provides for the option chosen or not.
> > >
> >
> > relax the `always' is what I would like to do first.
> > But I was afread to break some complatiable.
>
>  Hmm, honestly I don't think there could be any compatibility to care of
> here given that the compiler currently refuses to run with such an option
> combination.  Nobody may have relied on that then and the extra protection
> given is in my opinion a bit excessive.  Garbage in, garbage out: you get
> what you have requested.  Our usual policy with irrelevant options has
> been to silently ignore them, which helps users override Makefile defaults
> by just having CFLAGS, etc. appended to whatever the defaults are.
>

Again, it is not so easy to ask ALL packages to have MIPS dirty
patches just for a CFLAGS.

> > >  Please note that in the discussion quoted Catherine has already raised a
> > > concern I agree with of adding a complication here, and now we would
> > > complicate it even further by not only adding a fourth choice, but another
> > > overridable configuration default as well.
> >
> > I am still concern about whether we should just set the `always' as default.
> > My short team plan is to set it default for Debian r6 Port.
> > So, at least, I wish that we can provide a buildtime option for other need.
>
>  You're free with what you do with your distribution, although it seems to
> me like it's going to be a performance regression.  I suggest that you do

With some tests, there are no performance regression at all on I6500.

> some benchmarking with real code and hardware before you decide.  Maybe
> you can prove me wrong and there will be no loss of any significance.

#include "stdio.h"
#include "perf_func.h"
#include <mips/m64c0.h>

int PerfCnt[4];

__attribute__((noinline)) int add_1(int x) {
        return x+1;
}

int main(){
        int a;
        mipsperf_switch_group();
        for (int i=0; i<1000000; i++){
                a+=add_1(a);
        }
        PerfCnt[0] = mips32_get_c0(C0_PERFCNT0);
        PerfCnt[1] = mips32_get_c0(C0_PERFCNT1);
        return a;
}

This is an example of my tests. It has almost the same performance with
   -mcompact-branches=always
  -mcompact-branches=optimal

I guess you may have more tests.

>
>   Maciej


More information about the Gcc-patches mailing list