Bug 21111 - IA-64 NaT consumption faults due to uninitialized register reads
Summary: IA-64 NaT consumption faults due to uninitialized register reads
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2005-04-19 20:31 UTC by Jim Wilson
Modified: 2024-03-12 06:49 UTC (History)
9 users (show)

See Also:
Host:
Target: ia64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-01-11 00:00:00


Attachments
synthetic IA-64 testcase to generate NaT consumption fault (173 bytes, text/plain)
2005-04-19 20:41 UTC, Jim Wilson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Wilson 2005-04-19 20:31:58 UTC
Because of the decomposition of structures in tree-ssa, the middle end is
emitting RTL code that can read uninitialized registers.  On IA-64, this can
result in a NaT consumption fault if the uninitialized register has its NaT bit set.

Before tree-ssa, we would have had a constructor, and store_constructor takes
pains to ensure that a register is initialized to zero before we start setting
fields in it.

Since we do not yet have speculation support in the IA-64 backend, this problem
will be very hard to trigger without a synthetic example.  There is some hand
written code in glibc that uses speculation, and hence can generate NaT bits. 
Also, there is code in the kernel that can generate NaT bits.
Comment 1 Jim Wilson 2005-04-19 20:41:50 UTC
Created attachment 8684 [details]
synthetic IA-64 testcase to generate NaT consumption fault

Compile this with -O, and it will fail with gcc-2.96 and gcc-4.0.0.  We get an
illegal instruction signal.  It works with gcc-3.0 and gcc-3.3 and presumably
with every other gcc-3.x release.

If you use gcc-4.0.0 to compile with -O -S, and look at the assembly code, you
can see that r8 is being used uninitialized, which fails if r8 happens to have
the NaT bit set.
Comment 2 Andrew Pinski 2005-04-19 20:47:44 UTC
To me, the target specific code should be the one to fix this problem up and not the middle-end or at 
least have a hook for it so you don't mess around with other targets getting the speed up.  Anyways 
seems like someone thought it would be cool if they did this, oh well.
Comment 3 Jim Wilson 2005-04-19 23:05:20 UTC
Subject: Re:  IA-64 NaT consumption faults due to uninitialized
 register reads

pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-19 20:47 -------
> To me, the target specific code should be the one to fix this problem up and not the middle-end or at 
> least have a hook for it so you don't mess around with other targets getting the speed up.  Anyways 
> seems like someone thought it would be cool if they did this, oh well.

The change I am suggesting should not hurt performance, and I expect 
that it would actually help performance in many cases.

Currently, the first assignment to a structure is a bitfield insert.

If we zero the structure before the first assignment, then combine will 
give us a simple assignment instead, which will be faster than a 
bitfield insert for most targets.  This may also allow other assignments 
to be combined in, giving further benefits.  (There can be multiple 
first assignments if there are multiple blocks where the structure 
becomes live.)

I agree that the optimizations being performed by tree-ssa are useful 
here, but one must not be confused by the big picture issues here into 
ignoring the details.  Emitting a bit-field insert when only a simple 
assignment is needed is wrong.  It may cause performance loss on many 
targets, and it causes core dumps on IA-64.

Take a look at this example.
struct s { unsigned long i : 32; unsigned long j : 32;};
int i;
struct s
sub (void)
{
   struct s foo;
   foo.i = i;
   return foo;
}
Compiling this for x86-64 on mainline, I get 10 instructions, which 
perform two bit-field insertions.  Compiling this with gcc-3.3, I get 7 
instructions which perform one bit-field insertion.

I think the optimal code is two instructions, one to load i into the low 
part of the return register, and one to return.  The upper bits of the 
structure are don't care bits, so we can set them to anything we want. 
There is no need for any bitfield insertion here at all.

Mainline does even worse than gcc-3 here because in order to decompose 
the structure it creates a fake j assignment, and then we end up 
emitting bitfield insertion code for the fake j assignment, even though 
this code is completely useless.  Furthermore, the RTL optimizer is not 
able to delete this fake j assignment, because it is a bitfield insert.
Comment 4 Andrew Pinski 2005-07-19 05:24:05 UTC
(In reply to comment #3)
> Subject: Re:  IA-64 NaT consumption faults due to uninitialized
> The change I am suggesting should not hurt performance, and I expect 
> that it would actually help performance in many cases.
> 
> Currently, the first assignment to a structure is a bitfield insert.

This is PR 15596 and really an expand issue with respect with NVR.  Use the C++ front-end for both 
3.3 and 4.0.0 and noticed that it happens there too.

I would not doubt you could reproduce this with the C++ front-end before 4.0.0 also.  This is all NVR 
related and nothing more.
Comment 5 Richard Biener 2012-01-11 13:04:46 UTC
Is this still an issue?
Comment 6 Eric Gallager 2017-07-27 02:01:02 UTC
(In reply to Richard Biener from comment #5)
> Is this still an issue?

Guessing it isn't since the reporter never replied
Comment 7 Jim Wilson 2017-07-27 04:19:10 UTC
I must have failed to realize that this was a bug I had created when I got the earlier message.

Looking now, I see that I can still reproduce with mainline, though the problem is not as obvious with this testcase.  Compiling with -O -S -fdump-rtl-all, I see for the bitfield insert of i into foo.i

(insn 8 5 9 2 (set (reg:DI 348)
        (zero_extend:DI (reg/v:SI 345 [ i ]))) "tmp.c":10 -1
     (nil))
(insn 9 8 10 2 (set (reg:DI 350)
        (const_int 2147483647 [0x7fffffff])) "tmp.c":10 -1
     (nil))
(insn 10 9 11 2 (set (reg:DI 349)
        (and:DI (reg:DI 348)
            (reg:DI 350))) "tmp.c":10 -1
     (nil))
(insn 11 10 12 2 (set (reg:DI 352)
        (const_int -2147483648 [0xffffffff80000000])) "tmp.c":10 -1
     (nil))
(insn 12 11 13 2 (set (reg:DI 351)
        (and:DI (reg:DI 343 [ D.1463 ])
            (reg:DI 352))) "tmp.c":10 -1
     (nil))
(insn 13 12 14 2 (set (reg:DI 353)
        (ior:DI (reg:DI 351)
            (reg:DI 349))) "tmp.c":10 -1
     (nil))
(insn 14 13 15 2 (set (reg:DI 343 [ D.1463 ])
        (reg:DI 353)) "tmp.c":10 -1
     (nil))

And note that in the fifth insn (reg:DI 343) is used uninitialized.  This is not safe on IA-64, as an uninit register may accidentally have the NaT bit set, which would then cause a trap.

For this testcase, later optimization passes clean up the mess, and we get correct code at the end.  However, it is not safe to have a read of an uninit pseudo on IA-64, as there is no guarantee that a later optimization pass will accidentally fix it.

Mostly the compiler works because we rarely speculate, and hence we rarely have registers with the NaT bit set, and hence it is very rare that anyone will ever trigger this problem.

Andrew Pinski suggests that it is an NRV problem, but I see the same issue with NRV disabled.

An possible fix might be to force all locals to zero, to ensure that we never read from an uninit register.

I no longer have access to IA-64 hardware at home or at work.  There is no IA-64 hardware in the compile farm either.  I also no longer care whether this issue gets addressed.
Comment 8 Alexander Monakov 2017-07-27 08:34:25 UTC
(In reply to Jim Wilson from comment #7)
> An possible fix might be to force all locals to zero, to ensure that we
> never read from an uninit register.

My understanding is that with introduction of pass_initialize_regs in 4.3 gcc did that for must-uninit uses, masking off many potential issues.  I don't know how hard it would be to produce a testcase that breaks today, but it would need to involve may-uninit accesses, not must-uninit.
Comment 9 Alexander Monakov 2017-07-27 09:40:49 UTC
OK, so on the following testcase:

struct s {int f:1;};
struct s f(int x, int y)
{
  struct s r;
  if (x) {
    r.f = 1;
    asm("");
  }
  if (y) {
    r.f = 1;
    asm("");
  }
  return r;
}

with -O -fno-tree-sra we emit:

        .proc f#
f:
        .prologue
        .body
        cmp4.eq p6, p7 = 0, r32
        (p6) br.cond.dptk .L2
        addl r8 = 1, r0
        ;;
.L2:
        ;;
        cmp4.eq p6, p7 = 0, r33
        (p6) br.cond.dptk .L3
        dep r8 = -1, r8, 0, 1
        ;;
.L3:
        ;;
        br.ret.sptk.many b0
        ;;
        .endp f#

Thus the call f(0, 1) won't clear the NaT bit on r8. So the bug should be valid.
Comment 10 Richard Biener 2018-07-27 06:46:05 UTC
Hmm, I don't remember whether uninit reads invoke undefined behavior, esp. result in a trap representation, but the original testcase doesn't seem to read uninitialized things.

For it I suspect that store_bitfield is at fault given it will read possibly
uninitialized values for the RMW cycle it performs.

So one of the questions is whether we have to / want to preserve any NaT
in non-speculated code which means simply clearing NaT everywhere in a
mdreorg-like pass?

And yes, pass_initialize_regs papers over a lot of issues here but that
pass should go away and the fallout fixed since it masks a lot of otherwise
latent issues given it only initializes must-undefs.  There's a PR for that
as well.

Jim, if you no longer have access to IA64 HW does it make sense for you
to be the port maintainer?  Would you be willing to step down and let
the port be deprecated for GCC 9 so we can remove it for GCC 10?
Comment 11 Alexander Monakov 2018-07-27 10:03:30 UTC
(In reply to Richard Biener from comment #10)
> Hmm, I don't remember whether uninit reads invoke undefined behavior, esp.
> result in a trap representation, but the original testcase doesn't seem to
> read uninitialized things.

Neither the original nor the updated testcase invoke UB, but to answer the question, C11 says uninit reads invoke UB if the object could have been declared with the 'register' keyword, otherwise it results in an indeterminate value (which can be a trap representation if its type allows that).

> For it I suspect that store_bitfield is at fault given it will read possibly
> uninitialized values for the RMW cycle it performs.

I think store_bit_field is fine. I'd say the problem is that "birth" of an object does not result in any tidying/preparing for the underlying storage on RTL. But (except for specific case of probing the stack for large objects) other targets don't need any such tidying afaik (except for a related SIMT issue on NVPTX).

> So one of the questions is whether we have to / want to preserve any NaT
> in non-speculated code which means simply clearing NaT everywhere in a
> mdreorg-like pass?

Given the above, I suspect it cannot be fully solved with a late RTL pass: one hardreg may be reused for two different objects, and an attempted speculation on the first object may leave the NaT bit set for the second object.
Comment 12 Jim Wilson 2018-07-27 16:35:08 UTC
I no longer have access to IA-64 hardware.  I was leaving myself as maintainer just so that there was someone responsible for answering questions.  I don't care if the port survives or not.  I can resign if that makes things easier.

There is an ia64 debian group that has hardware, and is still doing debian work.  They would be disappointed if the ia64 port was deprecated.  I get the occasional binutils and gcc bug reports from them, and am occasionally able to fix them even though I don't have hardware.

I think that solving this bug requires that all locals get initialized to 0 when defined.  Otherwise, you may accidentally read a register with the nat bit set when doing bit-field operations on a register.  I don't care enough about ia64 to try to fix this.
Comment 13 Eric Gallager 2019-04-28 21:36:22 UTC
(In reply to Jim Wilson from comment #12)
> I no longer have access to IA-64 hardware.  I was leaving myself as
> maintainer just so that there was someone responsible for answering
> questions.  I don't care if the port survives or not.  I can resign if that
> makes things easier.
> 
> There is an ia64 debian group that has hardware, and is still doing debian
> work.  They would be disappointed if the ia64 port was deprecated.  I get
> the occasional binutils and gcc bug reports from them, and am occasionally
> able to fix them even though I don't have hardware.

Do they have a bugzilla account here that could be cc-ed?

> 
> I think that solving this bug requires that all locals get initialized to 0
> when defined.  Otherwise, you may accidentally read a register with the nat
> bit set when doing bit-field operations on a register.  I don't care enough
> about ia64 to try to fix this.
Comment 14 Jim Wilson 2019-04-29 18:31:11 UTC
https://wiki.debian.org/Ports/ia64

James Clarke has been active recently on the binutils and/or gcc mailing lists.  My IA-64 work has dwindled down to nothing, as RISC-V work has kept me too busy to do anything else.  With the architecture already set to end-of-life next year, I'm not planning to do anymore IA-64 work.
Comment 15 Jim Wilson 2019-04-29 18:37:30 UTC
See also PR 87338 which has a response earlier today from John Paul Adrian Glaubitz.
Comment 16 Eric Gallager 2019-04-29 20:01:48 UTC
(In reply to Jim Wilson from comment #14)
> https://wiki.debian.org/Ports/ia64
> 
> James Clarke has been active recently on the binutils and/or gcc mailing
> lists.  My IA-64 work has dwindled down to nothing, as RISC-V work has kept
> me too busy to do anything else.  With the architecture already set to
> end-of-life next year, I'm not planning to do anymore IA-64 work.

(In reply to Jim Wilson from comment #15)
> See also PR 87338 which has a response earlier today from John Paul Adrian
> Glaubitz.

ok, thanks
Comment 17 Richard Biener 2021-10-11 07:53:35 UTC
So to sum up, a SSA default def would have to be expanded to an explicit (zero) initialization.  I suppose memory doesn't have a NaT state so stack allocation for locals doesn't have to change but decls eventually expanding to pseudos
would have to go through the same exercise.

Since we now have -ftrivial-auto-var-init=zero that might serve as a workaround for GCC 12.

But I guess trying to fix this in RTL expansion shouldn't be too difficult
(gated with a new target hook), if anyone really cares.  Do any better
maintained archs than ia64 have NaTs?  NVPTX was mentioned, but I guess
that's not really easier to handle since it's also a meta-ISA.
Comment 18 Alexander Monakov 2021-10-11 08:48:43 UTC
From my perspective, the main blocker for a nice and clean solution is lack of "birth" statements on GIMPLE.

Without them, expansion to RTL would either need to insert initialization at the top of the function (which is silly, extends lifetimes of pseudos that only live in a small region, complicating RA), or compute something like a lowest common dominator of all uses and place an initialization there. But perhaps that's the right way if "birth statements" aren't happening?

Or is there some other approach? Like not trying to insert a single initialization, but instead substituting a zero in place of each use of the default def individually?
Comment 19 Richard Biener 2021-10-11 09:18:41 UTC
(In reply to Alexander Monakov from comment #18)
> From my perspective, the main blocker for a nice and clean solution is lack
> of "birth" statements on GIMPLE.

They were not needed until now ;)

> Without them, expansion to RTL would either need to insert initialization at
> the top of the function (which is silly, extends lifetimes of pseudos that
> only live in a small region, complicating RA), or compute something like a
> lowest common dominator of all uses and place an initialization there. But
> perhaps that's the right way if "birth statements" aren't happening?

Yes, note that an explicit birth also affects SSA coalescing so they would
need to be introduced before liveness computation or alternatively
be inserted in possibly multiple places when the reg becomes live.

> Or is there some other approach? Like not trying to insert a single
> initialization, but instead substituting a zero in place of each use of the
> default def individually?

That would of course also work - use replace_uses_by to replace default
defs with zero for modes that have a NaT (I don't think we have an existing
way to check that though).

But maybe one wants to preserve NaT trapping for uninit uses (though that
would require active seeding of default defs with a NaT rather than with
a non-NaT since otherwise the register content is just random and not a NaT).