This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/15184] [3.4/4.0/4.1 Regression] Direct access to byte inside word not working with -march=pentiumpro
- From: "wilson at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 14 Oct 2005 17:44:17 -0000
- Subject: [Bug target/15184] [3.4/4.0/4.1 Regression] Direct access to byte inside word not working with -march=pentiumpro
- References: <bug-15184-1037@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #9 from wilson at gcc dot gnu dot org 2005-10-14 17:44 -------
The cause of this problem is the following two lines in the i386.c file
const int x86_himode_math = ~(m_PPRO);
const int x86_promote_hi_regs = m_PPRO;
They were added here:
http://gcc.gnu.org/ml/gcc-patches/2000-02/msg00890.html
The reason for this is, as a previous comment mentioned, that HImode
instructions are slow on the pentiumpro and should be avoided. Doing this
gives better performance in general, but unfortunately, for this particular
testcase, it causes us to miss an optimization.
The issue in this case is a combiner limit. If you compile for Pentium, you
get
(set (reg:HI 61) (and:HI (mem/c/i:HI (symbol_ref:SI ("y")))
(const_int -256)))
(set (reg:HI 63) (ior:HI (reg:HI 61)
(reg:HI 62)))
(set (mem/c/i:HI (symbol_ref:SI ("y"))
(reg:HI 63))
The combiner combines these 3 instructions to get
(set (mem/c/i:QI (const:SI (plus:SI (symbol_ref:SI ("x"))
(const_int 1 [0x1]))))
(subreg:QI (reg:SI 59 [ c ])))
However, for pentium pro, we end up with 4 instructions due to the HImode
promotion.
(set (reg:HI 61 [ y ])
(mem/c/i:HI (symbol_ref:SI ("y"))))
(set (reg:SI 62) (and:SI (subreg:SI (reg:HI 61 [ y ]) 0)
(const_int -256 [0xffffffffffffff00])))
(set (reg:SI 65) (ior:SI (reg:SI 62)
(subreg:SI (reg:HI 63 [ c ]) 0)))
(set (mem/c/i:HI (symbol_ref:SI ("y"))
(subreg:HI (reg:SI 65) 0))
The combiner combines at most 3 instructions, to avoid combinatorial explosion,
and hence we are not able to optimize this.
I'll look at this a bit more, but at the moment, I'm skeptical that there is
any easy solution.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15184