This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patch to change IA64 division code


> This seems pretty reasonable.
> 
> It isn't clear why you have the _a and _b variants in the div.md file.
> Especially since they generate identical code.  This looks like
> unnecessary duplication.
> 
> Using (const_int 0) in a pattern that expects an FP number looks wrong
> to me.  Better to use CONST0_RTX (RFmode) which will be a const_double I
> think.
>
> If we really need both patterns, then it would be simpler to have a
> common pattern.  You can use fr_reg_or_0_operand to match either an fr
> reg or a constant 0.  This would require using a proper FP constant for
> 0.

I think we can get rid of the duplicate patterns by using CONST0_RTX
(RFmode) and fr_reg_or_0_operand.  I will work on making this change.

> It isn't clear why you have UNSPEC_NOP_CONVERT stuff to convert to/from
> RFmode.  Especially since you just split them after reload into a nop
> move.  I presume there is a reason for this.  A comment to explain this
> reason should be added so the code makes more sense.

Since all the predicated math instructions expect RFmode inputs and
generate RFmode outputs there needs to be some way to convert the
original SFmode/DFmode inputs into RFmode (and the final result back
into SFmode or DFmode).  Otherwise the predicated math instructions need
to handle SFmode and DFmode and RFmode inputs and outputs.  I will add
some comments for this.

> Similarly, it isn't clear why UNSPEC_NOP_CONVERT requires an rtx
> barrier, when it is going to go away later.

I think this is wrong and I shouldn't have the barrier.

> The long divide expanders include a lot of explicit rtl.  You could
> alternatively call gen expander functions.  So instead of having
> +   (parallel [(set (match_dup 6)
> +                   (if_then_else:RF (ne:RF (match_dup 14) (const_int
> 0))
> +                     (minus:RF (match_dup 18)
> +                               (mult:RF (match_dup 5) (match_dup 3)))
> +                     (const_int 0)))
> +              (use (match_dup 16)) (use (match_dup 17))])
> You can do this
>   gen_subdf3_cond_b (operands[6], operands[14], operands[18],    
>          operands[5], operands[3], operands[16], operands[17]);
> This is smaller, and might be more readable, though it is a matter of
> taste and isn't really that much better than what you have.  An
> advantage of my suggestion is that if you have to change the rtl for the
> subdf3_cond_b pattern, then you only have to fix it once.  Whereas, with
> your code, you have to fix it in 3 places, where it is defined, and in
> the two copies in the divide expanders.  This gets worse when you add
> more divide expanders, as then you will have even more copies.  This is
> an argument against using your method.  This is how udivsi3 works for
> instance if you want to look at an example.  Another advantage of my
> suggestion is that you can give meaningful names to the operands.  So
> instead of operands[16] you can have round_double which is more readable
> and might help reduce errors.  You can also give the intermediate
> operands names like "y" and "e" to match the comments in your rtl
> expanders.  Another obscure consideration here is that patterns with a
> large number of operands will increase the size of some internal arrays
> and structures, which may make gcc run slower.  See for instance
> MAX_DUP_OPERANDS in insn-config.h, created by genconfig.  I have 14 in
> my copy, which is due to the divdf_internal_lat pattern.  This number
> will be even higher with your patch.  But this is not a problem if we
> call gen* functions to expand the rtl.  I can accept either alternative
> here, I just wanted you to consider the options.

I have tried writing the expanders both ways but never had a strong
feeling about one vs.  the other but the idea of being able to use more
meaningful variable names does sound like a good idea.  I will consider
this some more.

> It isn't clear why you changed HARD_REGNO_CALLER_SAVE_MODE from XFmode
> to RFmode.  This is just going to add useless conversions to/from RFmode
> that will later be replaced with nop moves.  So why bother?  I am
> concerned that this might hurt performance.
> 
> It does appear that there might be a problem if we need to save/restore
> one of these RFmode regs across a call.  In that case, the only save
> insn is a fr_spill and fr_restore, but your code doesn't expand into
> that anywhere.  Maybe this is why you have the unspec nop moves?  But if
> so, that that means the HARD_REGNO_CALLER_SAVE_MODE change is
> unnecessary, because we won't have any RFmode regs live across a call.

What I ran into (a while back so I hope I remember it correctly) was
that since the division code sequence was getting expanded earlier
(before reload) I wound up with one of the registers being used getting
spilled by reload (I think).  If you spill and fill in XFmode you do not
get exactly the same results back (80 bits vs.  82 bits).  If you spill
and fill in RFmode then you always get the exact original register
contents back.  This was affecting the division sequence where we don't
do intermediate rounding to SFmode or DFmode (or XFmode) during the
operations.

> At the end of the DFmode divide expander, you have a comment
> +;; Do an 'empty' conversion back to SFmode
> which should obviously be DFmode instead of SFmode.  But curiously there
> is no similar comment in the SFmode divide expander.

I will work on making these changes and prepare a new patch in a few
weeks.

Steve Ellcey
sje@cup.hp.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]