Problem exposed by recent ARM multiply changes

Fri Sep 27 08:08:00 GMT 2019

On Thu, Sep 26, 2019 at 05:34:25PM -0500, Segher Boessenkool wrote:
> > Are you sure?
> > "The UMLAL instruction interprets the values from Rn and Rm as unsigned
> > integers.  It multiplies these integers, and adds the 64-bit result to the
> > 64-bit unsigned integer contained in RdHi and RdLo."
> 
> Yes.  It adds two 64-bit numbers.  That is not the same as adding the
> top and bottom 32-bit halfs of that separately.

Sure.

> > Appart from the bug I've provided untested fix in the PR, the above pattern
> > seems to do what the documentation says, the low 32 bits are
> > (unsigned int) (Rn*Rm+RdLo), and the whole result for unsigned int
> > Rn, Rm, RdHi and RdLo is
> > (unsigned long long)Rn*Rm+((unsigned long long)RdHi<<32)+RdLo
> > and the upper 32 bits of that are IMHO equivalent to
> > (unsigned) (((unsigned long long)Rn*Rm+RdLo)>>32)+RdHi
> 
> The upper half of the result should be one bigger if the addition of the
> low half carries.

No, because the upper half in the RTL is computed like the whole result
shifted down by 32.  There is a small simplification, but that shouldn't
make a difference.

The whole result would be computed as:

(set (reg:DI 132)
     (plus:DI (mult:DI (zero_extend:DI (reg/v:SI 115 [ sec ]))
                       (zero_extend:DI (reg:SI 124)))
	      (reg:DI 130)))

which is the same as:
(set (reg:DI 132)
     (plus:DI (plus:DI (mult:DI (zero_extend:DI (reg/v:SI 115 [ sec ]))
                       (zero_extend:DI (reg:SI 124)))
		       (zero_extend:DI (reg:SI 130)))
	      (ashift:DI (zero_extend:DI (reg:SI 131 [+4 ])) (const_int 32))))

Now, the upper 32 bits of that are:
(set (reg:SI 133 [+4 ])
     (truncate:SI (lshiftrt:DI
      (plus:DI (plus:DI (mult:DI (zero_extend:DI (reg/v:SI 115 [ sec ]))
                        (zero_extend:DI (reg:SI 124)))
		        (zero_extend:DI (reg:SI 130)))
	       (ashift:DI (zero_extend:DI (reg:SI 131 [+4 ])) (const_int 32)))
      (const_int 32))))

But because 131 is shifted <<32 and the result then >>32, we can simplify
that further to what the pattern uses:
(set (reg:SI 133 [+4 ])
    (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI (zero_extend:DI (reg/v:SI 115 [ sec ]))
                        (zero_extend:DI (reg:SI 124)))
                    (zero_extend:DI (reg:SI 130)))
                (const_int 32 [0x20])))
        (reg:SI 131 [+4 ])))

	Jakub