This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Introducing redundancy in combine?

On 06/21/10 16:13, Sebastian Pop wrote:

I was looking at why, in the vectorized DCT kernel of FFmpeg, the insn
selection of GCC fails to produce XOP fused-multiply-add vector insns:
DOM is detecting a redundant expression that is optimized, and that
makes it impossible to detect the higher level insns in combine.

The DCT kernel looks like this:

static void
dct_unquantize_h263_inter_c (DCTELEM * block, int qscale, int nCoeffs)
   int i, level, qmul, qadd;

   qadd = (qscale - 1) | 1;
   qmul = qscale<<  1;

   for (i = 0; i<= nCoeffs; i++)
       level = block[i];

       if (level<  0)
	level = level * qmul + qadd;
	level = level * qmul - qadd;

       block[i] = level;

The expression "level * qmul" is redundant and is optimized out
of the condition:

       level = level * qmul;
       if (level<  0)
	level += qadd;
	level -= qadd;

On this code GCC fails to combine the + and the - with *, as they both
depend on the same computation.  However, if I am modifying the DCT
kernel to artificially remove the redundancy:

       if (level<  0)
	level = level * qmul + qadd;
	level = level * qadd - qmul;

the kernel is vectorized with the expected insns:

	vpmacsdd	%xmm1, %xmm6, %xmm0, %xmm3
	vpmacsdd	%xmm5, %xmm1, %xmm0, %xmm2
	vpcomltd	%xmm4, %xmm0, %xmm0
	vpcmov	%xmm0, %xmm2, %xmm3, %xmm0

Here is the slower and larger code generated for the original DCT,
with one * and two +:

	vpmulld	%xmm6, %xmm0, %xmm1
	vpcomltd	%xmm3, %xmm0, %xmm0
	vpaddd	%xmm5, %xmm1, %xmm2
	vpaddd	%xmm4, %xmm1, %xmm1
	vpcmov	%xmm0, %xmm1, %xmm2, %xmm0

Is there a simple way to teach combine how to introduce redundancy to
generate higher level insns?
Ouch. You've got another problem in that combine doesn't combine across basic blocks.

Can you attack it in forwprop?

I'm a little surprised DOM removed the multiplication -- it's not actually a runtime redundancy, it's more like a code hoisting since on any given iteration of the loop the expression level * qmul is only evaluated once.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]