This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
The following function (which by the way is meant to compute the unsigned remainder by 3), produces this really awful code on x86 (gcc 3.3.1) at any optimization level: int f(unsigned int x) { return ((x * 0x2AAAAAAABULL) >> 33) & 3; } ... movl $0xAAAAAAAB, %eax movl 8(%ebp), %ebx mull %ebx xorl %esi, %esi (1) leal (%edx,%ebx,2), %ecx movl %esi, %ebx imull $0xAAAAAAAB, %ebx, %ebx (2) leal (%ebx,%ecx), %edx movl %edx, %eax shrl $1, %eax andl $3, %eax ... The insns I marked with (1) and (2) are respectively, after RTL generation, (insn 11 10 12 (nil) (parallel [ (set (reg:DI 61) (zero_extend:DI (reg/v:SI 59))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 17 16 18 (nil) (parallel [ (set (reg:SI 66) (mult:SI (subreg:SI (reg:DI 61) 4) (const_int -1431655765 [0xaaaaaaab]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) CSE does not recognize that (subreg:SI (reg:DI 61) 4) must be zero since it is created with a (zero_extend:DI (reg/v:SI 59)). I've made a quick patch for 3.2.1: 2004-01-17 Paolo Bonzini <bonzini@gnu.org> * gcc/cse.c (fold_rtx): Simplify a SUBREG to zero if it is the high part of the source, and the source is a REG equivalent to a ZERO_EXTEND. The important part of the patch is: ,----- + if (GET_CODE (folded_arg0) == REG + && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (folded_arg0)) + && !subreg_lowpart_p (x)) + { + struct table_elt *elt; + + /* We can use HASH here since we know that canon_hash won't be + called. */ + elt = lookup (folded_arg0, + HASH (folded_arg0, GET_MODE (folded_arg0)), + GET_MODE (folded_arg0)); + + if (elt) + elt = elt->first_same_value; + + /* If this is a SUBREG representing the high part of a REG, check if + the register is the result of a zero extension. We can fold + the SUBREG to zero if the zero-extended expression's mode is + smaller or as big as the SUBREG's (for example, a subreg:HI of + a zero-extended reg:SI could be non-zero). */ + for (; elt; elt = elt->next_same_value) + { + if (GET_CODE (elt->exp) == ZERO_EXTEND + && GET_MODE_SIZE (GET_MODE (XEXP (elt->exp, 0))) + <= GET_MODE_SIZE (mode)) + return const0_rtx; + } + } `----- It fixes this problem and bootstraps cleanly (C, C++, Java); please tell me if it is the right way, so I can port to mainline and submit properly. Maybe it is better to put the transformation in simplify_rtx.c instead? It looks like a good deal of SUBREG manipulation done in cse.c after looking up the operand in the available expression table is not pass-dependent, but this is probably too much for me to do without any kind of guidance (it is my first foray into patching gcc). Paolo
Created an attachment (id=5510) [edit] One-line function exhibiting bad code generation (This is the function included in the bugreport)
Confirmed. I cannot comment on the patch at all though. CSE is a mess and I would not touch it with my life though.
Approved patch pending http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02113.html
Subject: Bug 13724 CVSROOT: /cvs/gcc Module name: gcc Changes by: sayle@gcc.gnu.org 2004-01-23 02:03:26 Modified files: gcc : ChangeLog cse.c Log message: 2004-01-22 Paolo Bonzini <bonzini@gnu.org> PR optimization/13724 * cse.c (fold_rtx) <SUBREG>: Fold a SUBREG to zero if it represents the zero bits produced by a ZERO_EXTEND operation. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2424&r2=2.2425 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cse.c.diff?cvsroot=gcc&r1=1.279&r2=1.280
Fixed for 3.5.0.
Paolo Bozini mentioned this bug as an example of the 64bits arith on 32bits host "issue".