This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: GCC Scheduler support for R10000 on MIPS


Kumba <kumba@gentoo.org> writes:
> Richard Sandiford wrote:
>> I think you want "foo, bar" (foo one cycle, then bar the next)
>> rather than "foo + bar" (foo and bar simultaneously).  And you don't
>> want to tie up the issue and completion units for more than one cycle.
>
> That would make sense.  However, converting the 'foo + bar' to 'foo,
> bar' only seems to work as long as there aren't any repeat rates.
> Down in the fdiv bits, as it start to calculate the repeat rates, it
> starts to send the state count out of control, to the point where my
> octane runs out of memory trying to process it all.
>
>
> Here's what I converted one of the fdiv's into:
>
> (define_insn_reservation "r10k_fdiv_single" 12
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "fdiv,frdiv")
>              (eq_attr "mode" "SF")))
>    "r10k_fpmpy_issue, (r10k_fpdiv * 14), r10k_fpmpy_completion")
>
> I figure that syntax reads as "issue, fpdiv is 14 cycles, completion".
> But that repeat rate number at 14 makes the insn-automata.c build take
> a long time (an hour minimum).  At a repeat rate of 10, the NDA state
> count for r10k_a_fpmpy was in the 12,000 range (and took 4-5mins).
> Plus, the mips.dfa output is 822MB.

Yeah, that's not too surprising.  This model says that the pipeline
looks 15 cycles in advance to see whether a division issued now will
complete in 16 cycles' time, which needs a hefty number of DFA states
to track properly.  That's probably not how the pipeline works.

In other words, it's probably the completion stuff that's causing
problems.  Things might be better if you just model the issue and
execution stages.

Also, if you model the issue stage, you should model it for all insns,
not just the ones that use r10k_fpmpy_issue.

As always, the only way to know if you're making things better here
is to test it.  It may well be that things are better without this
issue stuff.

> Here's a quick question, though.  Integer multiply and divides happen
> on ALU2.  The manual makes a note that divides keep ALU2 busy for the
> duration of the divide.  I think this means that division isn't
> pipelined, and the GCC internals manual seems to describe something
> like this, though the example to me isn't easy to decipher.  If I'm
> interpreting it right, does this look correct?:
>
> (define_insn_reservation "r10k_idiv_single" 34
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "idiv")
>              (eq_attr "mode" "SI")))
>    "r10k_alu2 * 35, r10k_idiv_single")

Well, this reserves ALU2 for 35 cycles and (immediately after that)
reserves r10k_idiv_single for one cycle.  Is that what you wanted?

>> Again, I'm afraid it's really a case of trying and seeing what gives
>> the best performance. ;)
>
> Well, I dug around in mips.md, and I think I found the define_insn
> statement that sets up the "imul" type.  It looks like it only emits a
> "mult" asm instruction.  As far as I could tell, no "multu" or
> "dmultu" commands look like their emitted at all in mips.md.  I'm
> guess this isn't a widely used instructions?

That's not correct.  "imul" is used for MULT, MULTU, DMULT and DMULTU.
(The "<u>" in those patterns means "" for signed and "u" for unsigned.)

>> OK.  The time taken to compile something is certainly a valid benchmark.
>> (It forms part of SPECINT, of course.)
>
> Okay, I'll probably benchmark a lengthy program like glibc.  Even
> though the final output doesn't work yet (different problem there),
> it'll still compile and I can time it (usually, 3.5-5hrs).

FWOW, an alternative is to pick a single big file (e.g. gcc's fold-const)
and preprocess it.  You can then run cc1 on it directly, which means that
the benchmark is a single process.

>> Nothing goes wrong if you fail to handle an instruction.  The compiler
>> won't crash or anything.
>> 
>> At this stage we're dealing with things like "-march=r4130 -mtune=r10000".
>> I don't think any particular handling of imadd is better than any other
>> in that case.  So my personal perference would be to leave out the
>> unnecessary insns (IMADD, SIGNEXT, FRDIV1, etc.).
>
> Makes sense, so I went ahead and removed them.

Thanks.

>> Nope, that's mips.c:mips_cpu_table (which you're already handling
>> correctly).  "cpu" is just an .md copy of enum processor_type.
>
> Gotcha.  How about the "cpu" attr in the scheduler definition?  Do
> those need to be removed (to match the values in mips.md's "cpu"
> type), or is it checking on what's passed to -march?

Yes, '(eq_attr "cpu" ...)' tests the attribute defined by
'(define_attr "cpu" ...)', so you need to remove the processor
names from both.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]