This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] teach emit_store_flag to use clz/ctz


On Sat, May 12, 2012 at 11:36 AM, Maciej W. Rozycki
<macro@linux-mips.org> wrote:
> On Sun, 6 May 2012, Andrew Pinski wrote:
>
>> > ÂFor the record: MIPS processors that implement CLZ/CLO (for some reason
>> > CTZ/CTO haven't been added to the architecture, but these operations can
>> > be cheaply transformed into CLZ/CLO) generally have a dedicated unit that
>> > causes no pipeline stall for these instructions even in the simplest
>> > pipeline designs like the M4K -- IOW they are issued at the usual one
>> > instruction per pipeline clock rate.
>>
>> Even on Octeon this is true. ÂThough Octeon has seq/sneq too so it
>> does not matter in the end.
>
> ÂDoes Octeon's pipeline qualify as simple? ÂFor some reason I've thought
> it is a high-performance core. ÂThe M4K is one of the smallest/simplest
> MIPS chips ever built.

Yes the octeon's pipeline qualifies as simple.  It is still an
in-order pipeline with few stages.  The high-performance of the core
is just the clock rate rather than the pipeline.  And the number of
cores on one chip is the other thing which makes it high performance.

>
> ÂAnd actually all MIPS processors (back to 1985's MIPS I ISA) support
> two-instruction set-if-equal and set-if-not-equal sequences:
>
>    Âxor   rd, rt, rs
>    Âsltiu  rd, rd, 1
>
> and:
>
>    Âxor   rd, rt, rs
>    Âsltu  Ârd, zero, rd
>
> respectively, that may still be more beneficial than any possible
> alternatives, especially ones involving branches.
>
>> Note I originally was the one who proposed this optimization for
>> PowerPC even before I saw what XLC did. ÂSee PR 10588 (which I filed 9
>> years ago) Âand it seems we are about to fix it soon.
>
> ÂFor that -- set-if-zero and set-if-non-zero -- you can use the
> instructions as above (that are supported by all MIPS processors):
>
>    Âsltiu  rd, rs, 1
>
> and
>
>    Âsltu  Ârd, zero, rs
>
> However GCC doesn't seem smart enough to use them well with your example.
> I'd expect something like:
>
>    Âsltiu  $4, $4, 1
>    Âsltiu  $2, $5, 1
>    Âjr   Â$31
>     or   $2, $4, $2
>
> however I get:
>
>    Âbeq   $4, $0, .L3
> Â Â Â Â nop
>    Âjr   Â$31
> Â Â Â Â sltiu Â$2, $5, 1
> .L3:
>    Âjr   Â$31
>     li   $2, 1
>
> which is never faster and obviously not smaller either. ÂAnd there is
> really no need to avoid the second comparison as per logical OR rules here
> -- it's all in registers.


I have a few patches already in my queue to submit upstream to improve
the above case for MIPS.


>
> ÂThis pessimisation is avoided for MIPS IV and more recent processors that
> have move-if-non-zero however (and the second comparison is always
> evaluated):
>
>    Âsltiu  $5, $5, 1
>    Âli   Â$2, 1
>    Âjr   Â$31
>     movn  $2, $5, $4
>
> Any chance to get it better with the fix you've mentioned?

The above is worse than using the or for at least the octeon as movn
is 3 cycles while or is only 1 cycle.  As I mentioned, I have a few
patches already in my queue which improves the code for MIPS (and
other targets too) but I have not got around to submitting them
upstream because I have been busy working on more patches.

Thanks,
Andrew Pinski


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]