This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] match.pd: optimize unsigned mul overflow check

From: Alexander Monakov <amonakov at ispras dot ru>
To: Marc Glisse <marc dot glisse at inria dot fr>
Cc: gcc-patches at gcc dot gnu dot org
Date: Mon, 30 May 2016 10:14:30 +0300 (MSK)
Subject: Re: [PATCH] match.pd: optimize unsigned mul overflow check
Authentication-results: sourceware.org; auth=none
References: <alpine dot LNX dot 2 dot 20 dot 1605282232330 dot 2043 at monopod dot intra dot ispras dot ru> <alpine dot DEB dot 2 dot 20 dot 1605292303050 dot 1983 at laptop-mg dot saclay dot inria dot fr>

On Sun, 29 May 2016, Marc Glisse wrote:
> On Sat, 28 May 2016, Alexander Monakov wrote:
> 
> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether
> > 'A*B'
> > overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
> > invocation of __builtin_mul_overflow to avoid the divide operation.
> 
> I forgot to ask earlier: what does this give for modes / platforms where
> umulv4 does not have a specific implementation? Is the generic implementation
> worse than A>-1/B, in which case we may want to check optab_handler before
> doing the transformation? Or is it always at least as good?

If umulv<mode>4 is unavailable (which today is everywhere except x86), gcc
falls back as follows.  First, it tries to see if doing a multiplication in a
2x wider type is possible (which it usually is, as gcc supports __int128_t on
64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at
high bits of the 2x wide product.  This should boil down to doing a 'high
multiply' instruction if original operands' type matches register size, and a
normal multiply + masking high bits if the type is smaller than register.

Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform),
then gcc emits a sequence that performs the multiplication by parts in a 2x
narrower type.

I think the first, more commonly taken, fallback path results in an
always-good code. In the second case, the eliminated 64-bit divide is unlikely
to have a direct hw support; e.g., on i386 it's a library call to __udivdi3.
This makes the transformation a likely loss for code size, a likely win for
performance.  It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW)
with A*B on gimple.

Thanks.
Alexander

Follow-Ups:
- Re: [PATCH] match.pd: optimize unsigned mul overflow check
  - From: Richard Biener

References:
- [PATCH] match.pd: optimize unsigned mul overflow check
  - From: Alexander Monakov
- Re: [PATCH] match.pd: optimize unsigned mul overflow check
  - From: Marc Glisse

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]