This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/21111] IA-64 NaT consumption faults due to uninitialized register reads
- From: "wilson at specifixinc dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 19 Apr 2005 23:05:22 -0000
- Subject: [Bug middle-end/21111] IA-64 NaT consumption faults due to uninitialized register reads
- References: <20050419203158.21111.wilson@gcc.gnu.org>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From wilson at specifixinc dot com 2005-04-19 23:05 -------
Subject: Re: IA-64 NaT consumption faults due to uninitialized
register reads
pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-04-19 20:47 -------
> To me, the target specific code should be the one to fix this problem up and not the middle-end or at
> least have a hook for it so you don't mess around with other targets getting the speed up. Anyways
> seems like someone thought it would be cool if they did this, oh well.
The change I am suggesting should not hurt performance, and I expect
that it would actually help performance in many cases.
Currently, the first assignment to a structure is a bitfield insert.
If we zero the structure before the first assignment, then combine will
give us a simple assignment instead, which will be faster than a
bitfield insert for most targets. This may also allow other assignments
to be combined in, giving further benefits. (There can be multiple
first assignments if there are multiple blocks where the structure
becomes live.)
I agree that the optimizations being performed by tree-ssa are useful
here, but one must not be confused by the big picture issues here into
ignoring the details. Emitting a bit-field insert when only a simple
assignment is needed is wrong. It may cause performance loss on many
targets, and it causes core dumps on IA-64.
Take a look at this example.
struct s { unsigned long i : 32; unsigned long j : 32;};
int i;
struct s
sub (void)
{
struct s foo;
foo.i = i;
return foo;
}
Compiling this for x86-64 on mainline, I get 10 instructions, which
perform two bit-field insertions. Compiling this with gcc-3.3, I get 7
instructions which perform one bit-field insertion.
I think the optimal code is two instructions, one to load i into the low
part of the return register, and one to return. The upper bits of the
structure are don't care bits, so we can set them to anything we want.
There is no need for any bitfield insertion here at all.
Mainline does even worse than gcc-3 here because in order to decompose
the structure it creates a fake j assignment, and then we end up
emitting bitfield insertion code for the fake j assignment, even though
this code is completely useless. Furthermore, the RTL optimizer is not
able to delete this fake j assignment, because it is a bitfield insert.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21111