This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2


On Fri, 24 Jul 2015, Kyrill Tkachov wrote:

> 
> On 24/07/15 09:23, Jakub Jelinek wrote:
> > On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> > > This transformation folds (X % C) == N into
> > > X & ((1 << (size - 1)) | (C - 1))) == N
> > > for constants C and N where N is positive and C is a power of 2.
> > > 
> > > The idea is similar to the existing X % C -> X & (C - 1) transformation
> > > for unsigned values but this time when we have the comparison we use the
> > > C - 1 mask (all 1s for power of 2 N) orred with the sign bit.
> > > 
> > > At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
> > > calculates X % C, which is compared against the positive N.
> > > 
> > > If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
> > > X % C but since we also catch the set sign bit, it will never compare
> > > equal
> > > to N (which is positive), so we still get the right comparison result.
> > > 
> > > I don't have much experience with writing match.pd patterns, so I
> > > appreciate
> > > any feedback on how to write this properly.
> > > 
> > > Bootstrapped and tested on arm, aarch64, x86_64.
> > I think this is another case that, if at all, should be done during or right
> > before RTL expansion and should test rtx costs.
> 
> Hmm, where would that be?
> expmed.c has a lot of code to expand div or mod by a power of 2.
> In what form would a compare with a mod by power of 2 arrive to
> the expansion phase?

It arrives as SSA_NAME == N and you can use get_gimple_for_ssa_name
or get_def_for_expr to get at the defining stmt if that is possible
(it's still unexpanded and thus TERed) and expand a different
expression.

But why can't simplify-rtx via combine handle this - it should have
access to target costs.

> > Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive,
> > while C cheap, and % might not be that expensive compared to & to offset
> > that.
> > 
> > E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other
> > (well, if we don't take instruction size into account), but 64-bit constant
> > is at least 3 times more expensive (movabsq is needed with its latency).
> > In the x86_64 case supposedly the divmod is still more expensive, but there
> > are many other targets.  On sparc64 for 64-bit constants, you might need
> > many instructions to create the constants, etc.
> 
> Ok, I am not familiar with sparc64. The constant is just a 1
> in the sign bit orred with a continuous string of ones.
> That's usually cheap on aarch64 but may not be so on other targets.

On GIMPLE we might still want to canonicalize to one form.  I'd
canonicalize to the form with "smaller" constants if the number
of operations is the same.

Richard.

> Thanks,
> Kyrill
> 
> > 
> > 	Jakub
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]