[RFC] If conversion min/max search, costs and problems

Robin Dapp rdapp@linux.vnet.ibm.com
Tue Jul 25 09:25:00 GMT 2017


Hi,

recently I wondered why a snippet like the following is not being
if-converted at all on s390:

int foo (int *a, unsigned int n)
{
  int min = 999999;
  int bla = 0;
  for (int i = 0; i < n; i++)
    {
      if (a[i] < min)
        {
          min = a[i];
          bla = 1;
        }
    }

  if (bla)
    min += 1;
  return min;
}

After skimming through ifcvt's code, it turned out that there are two
paths that could handle such a situation.  At first I looked at
cond_move_process_if_block ().  One condition to check if the block is
suitable for processing is

      if (reg_overlap_mentioned_p (dest, cond))
	return FALSE;

i.e. abort if the destination of a conditional move appears in the
condition.  This means we do not handle minimum or maximum searches like
if (a < min) min = a; with this.  That might be intentional as
noce_try_minmax () can handle it but for me the reason was not entirely
obvious.

Disabling the condition above for the snippet

 if (a[i] < min)
 {
   min = a[i];
   bla = 1;
 }

yields assembly like the following on i386:

 cmpl    %edx, %ecx
 cmovg   %ecx, %edx
 cmpl    %edx, %ecx
 cmovg   %eax, %esi

meaning the compare is emitted multiple times and a write to a register
used by the compare is obviously wrong.

The compares are produced by noce_emit_cmove () calling
emit_conditional_move () which, in turn, seems to prepare and emit a
comparison every time it emits a conditional move.  Is this really
necessary or the correct thing to do?

--

The second ifcvt path uses noce_convert_multiple_sets () whose
preconditions don't seem as strict.  The conversion result is never
considered however, because costs are estimated as higher than the
non-converted version.  The cost estimation seems odd to me at best though:

Before noce_process_if_block () the original costs are set to
 if_info.original_cost = COSTS_N_INSNS (2);
but the adding of the then/else block costs is only performed after
noce_convert_multiple_sets () i.e. the costs after conversion will never
be lower than the original costs.  When artifically lowering the costs
we end up with the same assembly sequence as above including the
superfluous compares.

To address these issues I came up with a tentative patch that
 - adds cost for the original then basic block and adapts s390's
noce_conversion_profitable_p () hook to allow processing (magic value
for now).
 - extracts the compare from the first-emitted conditional move and uses
it for the subsequent conditional moves getting rid of all but the first
compare.

Test suite on s390 and i386 looks ok.

Some questions/remarks, comments welcome:
 - ifcvt performs creates things that it expects other passes to clean
up afterwards.  In the case of the superfluous compares this might be
possible but the code is actually wrong and we cannot rely on a pass
fixing wrong code.
 - The extraction of the compare from the conditional move is pretty
ad-hoc and pattern-dependent currently.  Is there a way to do it in a
more backend-independent fashion?
 - Branch mispredict costs don't seem to play a major role in ifcvt.
Shouldn't they be accounted for in all cases, maybe weighted by bb
probabilities?

Regards
 Robin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-ifconv-test4.diff
Type: text/x-patch
Size: 9242 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20170725/09546f30/attachment.bin>


More information about the Gcc-patches mailing list