Parallelized loads and widening mults cont:ed (was: Re: GCC porting tutorials)

Hans-Peter Nilsson
Fri Apr 30 01:58:00 GMT 2010

> Date: Thu, 29 Apr 2010 08:55:56 +0200 (CEST)
> From: "Jonas Paulsson" <>

> It feels good to know that the widening mults issue has been
> resolved

Yes, nice, and as late as last week too, though the patch was
from February.

> as
> it was a bit of a disapointment I noted the erratic behaviour with GCC
> 4.4.1. Perhaps you would care to comment on what to expect as a user now,
> then?

IIUC, it should Just Work.  No, I haven't checked.  Note that
the fix was somewhat along the lines of what you wrote in your
thesis IIUC; adding a specific pass to fix up separated
operations.  See
<> and
<>.  BTW,
my observation was from the 4.3 era.  It's a regression, which
explains why I hadn't noticed it with the 3.x version I used
before that.  A pity it was deemed too invasive to fix for 4.5.

> Another issue that gave me porting problems was the SIMD memory accesses,
> for e g doing a wide load into two adjacent narrow registers with one
> instruction. This was resolved earlier on the mailinglist to not be
> handleable on RTL, so I wonder now if anything has been done for this, as
> it too seems rather reasonable, just like the widening loads?

You wanted to load adjacent data in a wider mode that was then
to be separately used in a mode half that size, but the
registers had to be adjacent too?  That's kind of the opposite
problem to what's usually needed!  If the use of the data was
actually for the obvious wider mode (SI or V2HI), you'd just
have to define the movsi or movv2hi pattern and it would be
used, but that unfortunately seems not applicable in any way.
I'm not sure that problem is of common interest I'm afraid, but
if it can be resolved with a target-specific pass, there'd be
reason to add a hook somewhat like

But, did you check whether combine tried to match RTL that
looked somewhat like:

 [(set (reg:HI 1) (mem:HI (plus:SI (reg:HI 3) (const_int 2))))
  (set (reg:HI 2) (mem:HI (plus:SI (reg:HI 3) (const_int 4))))])

I.e. a parallel with the two loads where the addresses were
adjacent?  From gdb you inspect the calls to try_combine (IIRC).
That insn could have been matched to a pattern like:

(define_insn "*load_wide"
 [(set (match_operand:HI 0 "register_operand" "=d0,d1,d2")
       (match_operand:HI 1 "reg_plus_const_memory_operand" "m"))
  (set (match_operand:HI 2 "register_operand" "=d1,d2,d3")
       (match_operand:HI 3 "reg_plus_const_memory_operand" "m"))]
 "rtx_equal_p (XEXP (operands[3], 0),
               plus_constant (XEXP (operands[1]), 2))"
 "load_wide %0,%1")

Just a WAG, there are reasons this would not match in the
general case (for one, you'd want to try to match the opposite
order too).  Don't pay too much attention to the exact matching
predicates, constraints and condition above.  The point is just
whether combine tried to generate and match a parallel with two
valid loads, given source where there was obvious opportunity
for it.

That insn *could* then be caught with a pattern which would,
through the right constraints coerce register allocation to make
the right choices for the (initially separete) registers.  In
the example above, four registers are assumed to be valid as
destination with the matching singleton constraints d0..d3.

brgds, H-P

More information about the Gcc mailing list