This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Improved signed integer remainder by power of two
- From: Jeffrey A Law <law at redhat dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Steven Bosscher <stevenb at suse dot de>
- Date: Mon, 28 Jun 2004 13:15:57 -0600
- Subject: Re: [PATCH] Improved signed integer remainder by power of two
- Organization: Red Hat, Inc
- References: <Pine.LNX.4.44.0406271845400.2779-100000@www.eyesopen.com>
- Reply-to: law at redhat dot com
On Sun, 2004-06-27 at 19:23, Roger Sayle wrote:
> The following patch improves the code that GCC generates for signed
> integer remainder, modulo a constant which is a power of two. Many
> thanks for Steven Bosscher for bringing this to my attention, pointing
> out that GCC generates inferior code to other x86 compilers, including
> the IBM compiler for OS/2 and the OpenWatcom compiler. Thanks also
> to Andi Kleen who apparently pointed out this failing to Steven.
>
> Consider the simple test program below:
>
> int smod16(int x)
> {
> return x % 16;
> }
>
> On x86, using -O2 -fomit-frame-pointer mainline CVS currently generates
> the rather poor:
>
>
> smod16: movl 4(%esp), %eax
> testl %eax, %eax
> movl %eax, %edx
> js .L4
> andl $-16, %edx
> subl %edx, %eax
> ret
> .L4:
> leal 15(%eax), %edx
> andl $-16, %edx
> subl %edx, %eax
> ret
>
>
> Examining the middle-end code in expmed.c's expand_divmod, I'm actually
> impressed that we do this well, as there's currently no special case code
> for handling this form of TRUNC_MOD_EXPR. Instead, the initial RTL that
> we generate is calculated as "return x - 16*(x/16);", where the divisions
> and multiplications are handled by arithmetic shifts.
>
> The patch below introduces a new expand_smod_pow2 function, as a helper
> function to expand_divmod, to generate improved RTL for TRUNC_MOD_EXPR.
>
>
> With this patch, we now generate the following code with the above
> options.
>
> smod16: movl 4(%esp), %eax
> andl $-2147483633, %eax
> js .L4
> ret
> .L4:
> decl %eax
> orl $-16, %eax
> incl %eax
> ret
>
> The first "andl" is actually 0x8000000f, i.e. the signbit of the mode of
> this operation, and the significant low-order bits of the modulus. By
> performing this bitwise AND early, many targets (including x86) can avoid
> an explicit compare/test instruction when checking for the signedness of
> the operand. This happens to be the same idiom as used by Microsoft's
> Visual C/C++ version 6 compilers.
>
>
> And for targets with a significant conditional branch penalty, when
> BRANCH_COST >= 2, the patch below can also generate a straight-line
> sequence without conditional jumps.
>
> For example, using "-O2 -fomit-frame-pointer -mtune=pentium4", we now
> generate the following code:
>
> smod16: movl 4(%esp), %eax
> cltd
> xorl %edx, %eax
> subl %edx, %eax
> andl $15, %eax
> xorl %edx, %eax
> subl %edx, %eax
> ret
>
> which as luck would have it is identical to the code generated by the
> Intel version 7 compiler (and the IBM OS/2 and OpenWatcom compilers
> mentioned above).
>
>
> For the curious, the above assembly code is approximately equal to the
> following function:
>
> int smod16a(int x)
> {
> int signmask = x >> 31;
> x = (x ^ signmask) - signmask; // x = fabs(x)
> x = x & 15;
> x = (x ^ signmask) - signmask;
> return x;
> }
>
> This is relying on the idiom that "(x ^ -1) - -1" is equal to "-x",
> but that "(x ^ 0) - 0" is the identity operation. So the second
> and fourth lines above are effectively "conditional negation"
> instructions, controlled by the original sign of x.
>
> Or to make things clearer by removing this conditional execution:
>
> int smod16b(int x)
> {
> if (x < 0)
> {
> x = -x;
> x = x & 15;
> x = -x;
> }
> else
> x = x & 15;
> return x;
> }
>
>
>
> The following patch has been tested on i686-pc-linux-gnu, with a full
> "make bootstrap", all default languages, and regression tested with a
> top-level "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2004-06-27 Roger Sayle <roger@eyesopen.com>
>
> * expmed.c (expand_smod_pow2): New function to expand signed
> remainder by a constant power of 2, such as "x % 16".
> (expand_divmod): Call new expand_smod_pow2 when appropriate.
> Minor corrections to comments, e.g. splitting long lines.
Looks good. Please go ahead and install this patch.
jeff