PR 13722 candidate fix

Zack Weinberg zack@codesourcery.com
Sat Jan 24 01:54:00 GMT 2004


Jim Wilson <wilson@specifixinc.com> writes:

> You mention reload nightmares.  There is a reload nightmare only if you
> fight how reload works.  If we allocate an OImode scratch instead of a
> DImode scratch, then I think we will always get at least one safe
> scratch register.
[...]

I am more comfortable coding it this way, and I think it will be
easier to understand in six months even without benefit of lengthy
commentary.  Furthermore, I would like to eliminate OImode from the
ia64 back end (currently it's only used for STACK_SAVEAREA_MODE, which
should be handled differently) so I would prefer a solution that
doesn't need it.

[...]
> Keep in mind that Itanium2 has 4 memory ports.  It can do up to 2 loads
> and 2 stores per cycle.  Using auto-inc addresses instead of a scratch
> reg eliminates ILP, and will hurt performance.  The TImode stuff perhaps
> doesn't occur often enough to worry about, but I think this would
> seriously hurt ia64-hpux long double performance.  I think we should
> consider whether this performance hit is acceptable.  It might be
> acceptable to get the compiler working again, but at the very least I
> think it makes sense to add some ??? comments pointing out that we have
> an optimization problem.  At least this doesn't affect ia64-linux long
> double performance.

This, I think, can be addressed with peephole2 patterns, which have
the ability to allocate scratch registers without risk of overlap.  A
form like

(define_peephole2
  [(match_scratch:DI 3 "r")
   (set (match_operand:DI 0 "register_operand" "")
        (mem:DI (post_inc:DI (match_operand:DI 2 "register_operand" ""))))
   (set (match_operand:DI 1 "register_operand" "")
        (mem:DI (post_dec:DI (match_dup 2))))
   (match_dup 3)]
  "!optimize_size"
  [(set (match_dup 3)
        (add:DI (match_dup 2) (const_int 8)))
   (set (match_dup 0) (mem:DI (match_dup 2)))
   (set (match_dup 1) (mem:DI (match_dup 3)))]
  "")

will clean up the majority of these - note that it also gives us the
ability to *not* use a scratch pointer when optimizing for size, and
will just work in cases where there aren't scratch regs to be had.
(I am not claiming that the above is exactly the way it should be
written.  Perhaps it ought to be done by matching the MEMs and
changing them with adjust_automodify_address, or something like that.)

zw



More information about the Gcc-patches mailing list