[wide-int] doloop fixes

Sat Nov 2 17:51:00 GMT 2013

Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> On 11/02/2013 11:31 AM, Richard Sandiford wrote:
>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>> On 11/02/2013 10:47 AM, Richard Sandiford wrote:
>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>> On 11/02/2013 07:06 AM, Richard Sandiford wrote:
>>>>>> The first part of this is a simple type mismatch --
>>>>>> get_max_loop_iterations
>>>>>> returns a widest_int aka max_wide_int -- and I'd have installed it as
>>>>>> obvious.  The second part isn't as obvious though.  The old code stored
>>>>>> the maximum iterations as:
>>>>>>
>>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>>          || !iter.fits_shwi ())
>>>>>>        iterations_max = const0_rtx;
>>>>>>      else
>>>>>>        iterations_max = GEN_INT (iter.to_shwi ());
>>>>>>
>>>>>> and the new code uses:
>>>>>>
>>>>>>      if (!get_max_loop_iterations (loop, &iter)
>>>>>>          || !wi::fits_shwi_p (iter))
>>>>>>        iterations_max = const0_rtx;
>>>>>>      else
>>>>>>        iterations_max = immed_wide_int_const (iter, mode);
>>>>>>
>>>>>> which includes an extra canonicalisation.  I agree it would be good to do
>>>>>> that in principle, but I'm not sure it copes correctly with the case where
>>>>>> the loop iterates 1 << GET_MODE_PRECISION (mode) times.  Plus I think the
>>>>>> real fix would be to avoid the host dependence altogether, i.e. get rid
>>>>>> of the fits_shwi_p too.
>>>>>>
>>>>>> As it stands, this breaks bfin-elf's pattern, which has:
>>>>>>
>>>>>>      /* Due to limitations in the hardware (an initial loop count of 0
>>>>>>         does not loop 2^32 times) we must avoid to generate a hardware
>>>>>>         loops when we cannot rule out this case.  */
>>>>>>      if (!flag_unsafe_loop_optimizations
>>>>>>          && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) >= 0xFFFFFFFF)
>>>>>>        FAIL;
>>>>>>
>>>>>> With the sign-extending conversion, this now triggers more often than
>>>>>> it was supposed to.
>>>>>>
>>>>>> Since the old "GEN_INT (iter.to_shwi ());" works verbatim in wide-int too,
>>>>>> and since we still use that form in the doloop_begin code, I think we should
>>>>>> just back the immed_wide_int_const out.
>>>>>>
>>>>>> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some
>>>>>> unwanted testsuite changes in bfin-elf.  OK to install?
>>>>> I dislike this a lot.     I think that this is a badly written pattern
>>>>> in the bfin port if it depends on the host size.   As machines get
>>>>> bigger, (they may not be getting faster but they are still getting
>>>>> bigger) having traps like this at the portable level will bite us.
>>>> But this isn't at the portable level, it's backend-specific code.
>>>> It knows that it's dealing with a 32-bit counter.
>>> yes, but the patch is dumbing down the portable part of the compiler.
>>>>> in truth this code should really be iterations_max =
>>>>> immed_wide_int_const (iter, mode) with no tests at all.
>>>> But like I say, that won't cope with loops that iterate
>>>> 1 << GET_MODE_PRECISION (mode) times, even if that fits
>>>> in a signed HWI.  (That case could happen if the hardware
>>>> supports it and if the counter starts out as 0.)
>>>>
>>>> The mainline code is self-consistent in that the number of
>>>> iterations is passed as a positive signed HWI, using 0 if the
>>>> value isn't known or is greater than HOST_WIDE_INT_MAX.
>>>> In the current interface, the CONST_INT doesn't really have a mode.
>>>> It's just used as a convenient way of passing a HWI to an expander.
>>>>
>>>> I'm not saying that it's the ideal interface.  But if we want to change it,
>>>> let's do that separately, and test it on targets that have doloops.
>>>> Until then, the wide-int branch can provide the current interface
>>>> just as easily as mainline.
>>> we do not have a predicate at the rtl level like we do at the tree level
>>> where we can ask if the value can be represented in a mode without
>>> loosing any information.   But the proper thing to do here is to check
>>> to see if iter fits in the mode, if it does, then generate the
>>> immed_wide_int_const and if not generate the 0.   In this way it does
>>> the right thing for loops that iterate 1<<GET_MODE_PRECISION (mode),
>>> i.e. it will produce a 0.   And that will be correct even if the mode is
>>> SI or TI.
>> Well, like I say, I don't see the current rtx has having a mode,
>> even conceptually.  The CONST_INT is just a convenient way of passing
>> an iteration count to a function that only accepts rtxes as arguments.
>> If we could, we'd be passing the widest_int directly as a pointer or
>> reference.
>>
>> But since I'm just repeating what I said above, I should probably bow
>> out of this one.
> you point seems to be that this is a place where the "assumption" that 
> rtxes are signed bites us.
> if mode happens to be SImode but the variable was unsigned and large, 
> then we will sign extend this and get a pessimistic answer for this 
> pattern.

Right.

> on the other hand, if we do what you suggest, then we are 
> forever limited by the host precision.
>
> I see your point, but i do not know what to do.

Note that what I'm suggesting is simply restoring the mainline "else" code,
rather than something new.  wide-int doesn't change which of the if-else
arms are chosen, and if we keep the else arm as-is, existing code will
be happy.

"Forever" is a bit pessimistic (though probably true).  We're only
limited until somehow who cares about wide counters comes along and
changes things.  Or if you feel particularly motivated, you could do
it after the merge.  It just doesn't seem like something that needs to
be tackled here and now.

I mean, the current code is limited to CONST_INT -- and the doloop_end
pattern is specifically documented as taking a CONST_INT -- even though
we already have CONST_DOUBLE for wider types.  So the current set of rtxes
would already allow the pattern to pass wider iteration counts if necessary,
it's just that no port has needed them yet.

Thanks,
Richard