This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/64691] Suboptimal register allocation for bytes comparison on i386
- From: "ysrumyan at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 12 May 2015 09:58:32 +0000
- Subject: [Bug target/64691] Suboptimal register allocation for bytes comparison on i386
- Auto-submitted: auto-generated
- References: <bug-64691-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691
Yuri Rumyantsev <ysrumyan at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ysrumyan at gmail dot com
--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
I found another register allocation deficiency which can be exhibited at the
attached test-case extracted from important benchmark. If we look at inner loop
for(i = 0; i < size; i++) {
byte xr, xg, xb, t1;
sbyte t2, t3;
x1 = read[0];
x2 = read[1];
x3 = read[2];
t1 = (byte) (((C1 * x1) + (C2 * x2) + (C3 * x3) +
(1 << (SCALE - 1))) >> SCALE);
t2 = (sbyte) (((C4 * x1) + (C5 * x2) + (C6 * x3) +
(1 << (SCALE - 1))) >> SCALE);
t3 = (sbyte) (((C7 * x1) + (C8 * x2) + (C9 * x3) +
(1 << (SCALE - 1))) >> SCALE);
write[0] = t1;
write[1] = (byte) t2;
write[2] = (byte) t3;
read += 3;
write += 3;
}
we can see that 7 registers is enough to keep all variable (except for upper
loop bound): 3 registers for x1,x2,x3, 2 registers for read and write pointers
and 2 registers for computation one for t1,t2,t3 computations and one scratch
register for multiplications (but since consumers of t1,t2,t3 is byte store
this register must belong also to Q_REQS subset, i.e. AREG,BREG,CREG or DREG).
But LRA does not perform such allocation and this leads to redundant
spill/fills and results in performance degradation. Assembly file produced 6.0
compiler with "-O2 -m32 -march=slm" options is attached too.