divide 64-bit by constant for 32-bit target machines

Thu May 3 13:41:00 GMT 2012

On 03/05/12 11:27, Dinar Temirbulatov wrote:
> Hi,
> Here is updated version of patch. I added comments describing the algorithm.
> 
>> Hi Dinar.  I'm afraid it gives the wrong results for some dividends
>>  * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
>>  * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
>>  * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
>>  * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
> Oh, I have used signed multiplication instead of unsigned and that was
> the reason of errors above, fixed that typo.
> Tested on arm-7l with no new regressions.
> Ok for trunk?
> 

This clearly needs more work.

Comments:  Need to end with a period and two spaces.
Binary Operators: Need to be surrounded with white space.

As Andrew Haley has already said, some documentation of the algorithm is
needed.

Why is this restricted to BITS_PER_WORD == 32?

Costs: This code clearly needs to understand the relative cost of doing
division this way; there is a limit to the amount of inline expansion
that we should tolerate, particularly at -O2 and it's not clear that
this will be much better than a library call if we don't have a widening
multiply operation (as is true for older ARM chips, and presumably some
other CPUs).  In essence, you need to work out the cost of a divide
instruction (just as rtx costs for this) and the approximate cost of the
expanded algorithm.

Another issue that worries me is the number of live DImode values; on
machines with few registers this could cause excessive spilling to start
happening.

I also wonder whether it would be beneficial to generate custom
functions for division by specific constants (using this algorithm) and
then call those functions rather than the standard lib-call.  On ELF
systems the functions can marked as hidden and put into common sections
so that we only end up with one instance of each function in a program.

+      /* Checking for owerflow, please not that we couldn't user carry-flag
+         here before the reload pass .
+	 cres = (tmp < u0) || (tmp < u1); */

Generic code can't assume the presence of a carry flag either before or
after reload, so the latter part of the comment is somewhat meaningless.
 Also spelling error in comment.

Finally, do you have a copyright assignment with the FSF?  We can't use
this code without one.

R.