66790 – Invalid uninitialized register handling in REE

Bug 66790 - Invalid uninitialized register handling in REE

Summary: Invalid uninitialized register handling in REE

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	6.0

Importance:	P3 normal
Target Milestone:	6.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-07-07 16:05 UTC by Pierre-Marie de Rodat
Modified:	2015-12-19 22:36 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2015-07-29 00:00:00

Attachments
Part of the reproducer (191 bytes, text/plain) 2015-07-07 16:06 UTC, Pierre-Marie de Rodat	Details
Part of the reproducer (563 bytes, text/plain) 2015-07-07 16:06 UTC, Pierre-Marie de Rodat	Details
Part of the reproducer (175 bytes, text/plain) 2015-07-07 16:06 UTC, Pierre-Marie de Rodat	Details
Updated candidate patch (6.29 KB, patch) 2015-07-31 08:18 UTC, Pierre-Marie de Rodat	Details \| Diff
Reproducer with an uninitialized variable (no OUT parameter) (893 bytes, text/x-vhdl) 2015-09-10 14:26 UTC, Pierre-Marie de Rodat	Details
Updated candidate patch (6.47 KB, patch) 2015-09-23 10:04 UTC, Pierre-Marie de Rodat	Details \| Diff
Fix for DF_LIVE local BB information (620 bytes, patch) 2015-09-23 10:05 UTC, Pierre-Marie de Rodat	Details \| Diff
Updated candidate patch (7.30 KB, patch) 2015-09-24 07:43 UTC, Pierre-Marie de Rodat	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pierre-Marie de Rodat 2015-07-07 16:05:40 UTC

The reproducer I'm about to attach[1] seems to trigger a code generation issue at least on x86_64-linux:

$ gnatmake -q p -O3 -gnatn
$ ./p

raised PROGRAM_ERROR : p.adb:9 explicit raise

The bottom line is that when Success is False in Q.Get (q.adb, around line 40) its value is clobbered during return. If we build with -fno-ree, we can see in the assembly code (near the return point for q__get) the following sequence:

movzwl (%rax), %epb
...
somelabel:
...
movzwl %bp, %ebp
...
salq $16, %rax
orq %rbp, %rax

However, without the -fno-ree switch the instruction:

movzwl %bp, %ebp

is merged with its definition (the first movzwl instruction) by the REE pass. This is wrong since the definition does _not_ dominate this zero-extension: the first movzwl instruction can be by-passed through "somelabel".

I started to investigate this issue: the REE pass is currently not aware of this by-pass code path because it is reached by no proper definition (it brings an uninitialized %rbp register to the zero-extension). This happens in ree.c:get_defs; in our case (insn=second movzwl, reg=BP) DF_REF_CHAIN contains only one definition: the movzwl instruction.

Given the "somelabel" code path, I would rather expect DF_REF_CHAIN to hold a NULL reference to materialize the lack of initialization in one path. I found the DF_LR_IN/DF_LR_OUT macros from df.h: they provide information about uninitialized variables but the associated comment says they should not be used ("This intolerance should eventually be fixed."). Besides, they provide a basic-block-level information whereas we are rather interested in instruction-level information in REE.

I was thinking that REE may not be the only optimization pass needing this kind of information but I found no other one, so I'm not sure how this ought to be handled. Can anybody confirm my intuition about the NULL reference in DF_REF_CHAIN? I'm willing to implement it but I prefer first to be sure this is the right approach.

Thanks in advance!

[1] It's in Ada. I tried to translate it into C but any change in register allocation makes the issue disappear...

Comment 1 Pierre-Marie de Rodat 2015-07-07 16:06:12 UTC

Created attachment 35923 [details]
Part of the reproducer

Comment 2 Pierre-Marie de Rodat 2015-07-07 16:06:34 UTC

Created attachment 35924 [details]
Part of the reproducer

Comment 3 Pierre-Marie de Rodat 2015-07-07 16:06:44 UTC

Created attachment 35925 [details]
Part of the reproducer

Comment 4 Pierre-Marie de Rodat 2015-07-18 21:59:13 UTC

(In reply to Pierre-Marie de Rodat from comment #0)
> Given the "somelabel" code path, I would rather expect DF_REF_CHAIN to hold
> a NULL reference to materialize the lack of initialization in one path. I
> found the DF_LR_IN/DF_LR_OUT macros from df.h: they provide information
> about uninitialized variables but the associated comment says they should
> not be used ("This intolerance should eventually be fixed."). Besides, they
> provide a basic-block-level information whereas we are rather interested in
> instruction-level information in REE.

Having read more thoroughly dataflow code, I saw the DF_LIVE problem claims
that it provides "the logical AND of the IN and OUT sets from the LR problem
and the must-initialized problem", which would be useful: it could provide a
conservative set of registers that *may* be uninitialized (i.e. all registers
that may be uninitialized are included).

So I started looking at the DF_LIVE results, but I got strange results. For
instance, with a very simple C function:

    extern void do_nothing (void);

    int foo (int i, int b) {
      int r;
      if (b) { do_nothing (); r = i; }
      do_nothing ();
      return r & 0xffff;
    }
    /* foobar.c */

… the reference to the register holding "r" at the return statement is present
in the IN set for the corresponding basic block:

     $ gdb --args cc1 -O2 foobar.c
    (gdb) b find_removable_extensions
    (gdb) r
    (gdb) p dump_function_to_file(cfun.decl, stderr, 0)
    [...]
    (note 13 12 14 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
    (call_insn 14 13 15 4 [...])
    (insn 15 14 21 4 (set (reg:SI 0 ax [orig:92 D.1771 ] [92])
            (zero_extend:SI (reg:HI 3 bx [orig:87 r ] [87])))
            foobar.c:13 136 {*zero_extendhisi2}
        (nil))
    [...]
    (gdb) p df_live_top_dump(cfun.cfg.x_basic_block_info.m_vecdata[4], stderr)
    ;; live in 3 [bx] 7 [sp]
    ;; live gen 0 [ax]
    ;; live kill

This looks wrong to me: there is a path from the entry to this basic block that
doesn't cross a definition for the bx register, so it's not must-initialized
and thus should not be present in this set. Digging further, I noticed that in
df_live_confluence_n, we do:

    IN(succ) = OUT(pred) OR IN(succ)

… which, I think, should instead be an AND with respect to the comment: we
want to keep only registers that are must-initialized, so as soon as one *may*
be uninitialized it is out of the result. But this is how it works since the
dataflow merge in 2006… and doing this change unsurprisingly makes the compiler
generate wrong code.

So assuming the LIVE pass actually does not compute must-initialized registers
(but instead what I would call "maybe-initialized registers"), I tried to add
another problem: MIR (Must-Initialized Registers) and to use it in the REE
pass. I'm about to submit the patch on gcc-patches@.

Comment 5 Richard Biener 2015-07-29 13:06:05 UTC

So what is the issue with replacing zero-extending an uninitialized %ebp
with a random other value?  Both are essentially undefined when reached via
the somlabel bypass.

Is the real issue happening in a downstream pass?  I don't think REE does
anything wrong here.

Comment 6 Pierre-Marie de Rodat 2015-07-30 09:53:57 UTC

Thanks for your answer, Richard!

(In reply to Richard Biener from comment #5)
> So what is the issue with replacing zero-extending an uninitialized %ebp
> with a random other value?  Both are essentially undefined when reached via
> the somlabel bypass.

Well, at least on x86, even when %ebp is uninitialized, “movzwl %bp, %ebp” makes its upper half zero (and yes, the lower half is uninitialized). Hence a following “shr $0x10, %ebp” is supposed to leave zero in %epb, for instance. If we have a random value instead as shr’s input, this does not work anymore, so I consider this transformation is an error.

> Is the real issue happening in a downstream pass?  I don't think REE does
> anything wrong here.

REE is the pass where I observed the change in behavior I described in my first message, and I did not notice anything weird beyond this. If the above is wrong (i.e. if we cannot assume (zext UNDEF from i16 to i32) >> 16 == 0), then I guess I’ll have to look for something wrong up in the pipe. ;-)

Comment 7 Richard Biener 2015-07-30 10:16:25 UTC

(In reply to Pierre-Marie de Rodat from comment #6)
> Thanks for your answer, Richard!
> 
> (In reply to Richard Biener from comment #5)
> > So what is the issue with replacing zero-extending an uninitialized %ebp
> > with a random other value?  Both are essentially undefined when reached via
> > the somlabel bypass.
> 
> Well, at least on x86, even when %ebp is uninitialized, “movzwl %bp, %ebp”
> makes its upper half zero (and yes, the lower half is uninitialized). Hence
> a following “shr $0x10, %ebp” is supposed to leave zero in %epb, for
> instance. If we have a random value instead as shr’s input, this does not
> work anymore, so I consider this transformation is an error.
> 
> > Is the real issue happening in a downstream pass?  I don't think REE does
> > anything wrong here.
> 
> REE is the pass where I observed the change in behavior I described in my
> first message, and I did not notice anything weird beyond this. If the above
> is wrong (i.e. if we cannot assume (zext UNDEF from i16 to i32) >> 16 == 0),
> then I guess I’ll have to look for something wrong up in the pipe. ;-)

It would be certainly good to see why we have this UNDEF in the first place.

But yes, I have to agree that zext UNDEF >> 16 should be zero.  Unlike
sext UNDEF which is always undefined as the sign-bit is undefined.

Comment 8 Pierre-Marie de Rodat 2015-07-31 08:17:21 UTC

(In reply to Richard Biener from comment #7)
> It would be certainly good to see why we have this UNDEF in the first place.

Sure: here is a C translation of what happens in my Ada reproducer (only wrt. UNDEF: this is not a reproducer for the original bug).

    #include <stdint.h>

    struct option_t {
      uint16_t is_present;
      uint16_t value;
    };

    struct option_t
    foo (uint16_t a, uint16_t b)
    {
      struct option_t result;

      result.is_present = b != 0;
      if (result.is_present)
        result.value = a / b;
      return result;
    }

On x86, foo's return value is the 32-bit value in $eax. If b == 0, its lower half is undefined while its upper one is supposed to be 0. In other words, even though the value field is uninitialized, the is_present one must be 0. Although a trunk GCC produces no zero-extension at -O2 for this example, this simply exposes the source use case for the "half-initialized" value business.

> But yes, I have to agree that zext UNDEF >> 16 should be zero.  Unlike
> sext UNDEF which is always undefined as the sign-bit is undefined.

Good point! The patch I submitted previously pessimized the sign-extension case. The one I'm going to upload fixes this: bootstrapped and regtested successfuly on x86_64-linux.

Comment 9 Pierre-Marie de Rodat 2015-07-31 08:18:01 UTC

Created attachment 36098 [details]
Updated candidate patch

Comment 10 Eric Botcazou 2015-09-10 14:16:42 UTC

Just to mention that we have really thought hard about this bug at AdaCore and the current patch seems (unfortunately) to be the only plausible way out, if we don't want to pessimize the common case.  But suggestions are of course welcome.

We have also discussed papering over it in the Ada front-end by initializing the Out parameter to zero.  The problem is that you can write an equivalent Ada testcase with an explicitly uninitialized variable and trigger the bug (Pierre-Marie, could you attach it to the PR?  Concatenate the files though and mention that it needs to be gnatchop'ed) so we cannot paper over it in all cases.

But the fundamental thing to keep in mind here is that this is a valid Ada program and that it could be translated into a valid C program: there is indeed an uninitialized variable but the run-time behavior of the program doesn't depend on its value.  The bug in REE effectively propagates the uninitializedness of this variable to another, perfectly initialized variable, because the latter initialization is done with a zero-extension that is eliminated.

Comment 11 Pierre-Marie de Rodat 2015-09-10 14:26:48 UTC

Created attachment 36320 [details]
Reproducer with an uninitialized variable (no OUT parameter)

Comment 12 Eric Botcazou 2015-09-10 14:40:34 UTC

Thanks.  I misremembered, the testcase has a single variable with two fields, one uninitialized and one initialized, instead of two variables, but it's exactly the same reasoning (and it would be trivial to add the two variables anyway).

Comment 13 Pierre-Marie de Rodat 2015-09-10 14:44:32 UTC

(In reply to Eric Botcazou from comment #12)
> Thanks.  I misremembered, the testcase has a single variable with two
> fields, one uninitialized and one initialized, instead of two variables, but
> it's exactly the same reasoning (and it would be trivial to add the two
> variables anyway).

Thank you for your involvement into this! :-)

Comment 14 Bernd Schmidt 2015-09-16 11:45:31 UTC

I've looked at this for a while now. I agree that this is a bug in REE that needs to be fixed in a manner similar to this patch. However, there are some oddities here.

As you say, the df-live problem claims to compute a must-initialized problem, so it would be very odd to add another problem with the same name. I experimented with changing df_live_confluence_n to use and instead of ior, followed by actually taking the confluence_n from your mir problem. However, this does not work: ior enlarges the set while and shrinks it, so the initialization of the bitmaps would have to be different. It looks like your "visited" bit tries to avoid that, but I don't think this works. One testcase I looked at had this structure:

entry -> 2 -> 3 -> 3

i.e. a loop in  block 3. The confluence_n function takes the out bitmap from block 3 (which is all zeros) and ands it with the in bitmap from the same block, which has the incoming regs from block 2 at that point, but obviously the bitmap turns into all zeros at this point (block 3 was already visited due to the 2->3 edge). I don't see how your patch doesn't suffer from the same problem.

I think ideally we would figure out whether df_live really does have a mistake in it, and if so fix it rather than adding another problem.

Comment 15 Eric Botcazou 2015-09-16 17:52:32 UTC

> I've looked at this for a while now. I agree that this is a bug in REE that
> needs to be fixed in a manner similar to this patch.

Thanks for looking into it.

> I think ideally we would figure out whether df_live really does have a
> mistake in it, and if so fix it rather than adding another problem.

Its description is very confusing, to say the least.  Quoting df-problems.c:

   LIVE AND MUST-INITIALIZED REGISTERS.

   This problem first computes the IN and OUT bitvectors for the
   must-initialized registers problems, which is a forward problem.
   It gives the set of registers for which we MUST have an available
   definition on any path from the entry block to the entry/exit of
   a basic block.

I think that's incorrect, the correct description would be "it gives the set of registers for which we MUST have an available definition on at least one path from the entry block".  That's why it doesn't work for the problem at hand, which requires the "available definition on any path" thing.

But I'm not sure we can change it because, for example, init-regs uses it to detect must-uninitialized registers:

/* Check all of the uses of pseudo variables.  If any use that is MUST
   uninitialized, add a store of 0 immediately before it.

	      /* A use is MUST uninitialized if it reaches the top of
		 the block from the inside of the block (the lr test)
		 and no def for it reaches the top of the block from
		 outside of the block (the ur test).  */
	      if (bitmap_bit_p (lr, regno)
		  && (!bitmap_bit_p (ur, regno)))

that is to say, uses that don't have any available definition.

Comment 16 Bernd Schmidt 2015-09-16 22:54:24 UTC

So I think there's three things to solve
 - which problem should df_live be using, the one it says it's using or the one it actually does
 - if it's the former, see if passes using the information need fixing
 - if it's the latter and we need the new problem from this patch, ensure that it actually works correctly (initializing the out bitmaps to all ones, probably).

Comment 17 Eric Botcazou 2015-09-17 10:21:04 UTC

> So I think there's three things to solve
>  - which problem should df_live be using, the one it says it's using or the
> one it actually does
>  - if it's the former, see if passes using the information need fixing
>  - if it's the latter and we need the new problem from this patch, ensure
> that it actually works correctly (initializing the out bitmaps to all ones,
> probably).

That sounds like a good plan.  On second thoughts, for the first point, maybe a native speaker understands "an available definition on any path" as "an available definition on one path, whatever it is", in which case the description would be correct, albeit very confusing for non-native speakers.  It would be good to have some insight from the DF maintainers here (CCed).

In any case, what's needed to fix the bug is "an available definition on every path" and that's what Pierre-Marie's patch implements.  We're going to review it in light of your remarks.

Comment 18 Bernd Schmidt 2015-09-17 10:39:43 UTC

FWIW I did some googling to refresh my memory, and this may be helpful:

www.seas.harvard.edu/courses/cs252/2011sp/slides/Lec02-Dataflow.pdf

It looks like you want exactly the available expressions problem from page 26. Maybe that's better terminology than must-initialized? (I'm not sure, but I'd never heard the latter term before.)

I think I'm coming to the conclusion that df_live really does want what it currently uses, which is the guarantee that there is _any_ def that reaches a use. The commentary could use updating though, I think it clearly isn't computing a "must" problem.

When updating the patch please ensure that all new parameters are documented before the function.

Comment 19 Paolo Bonzini 2015-09-17 11:27:20 UTC

LIVE provides live registers that MAY be initialized (are initialized on at least one path).  The comments are all wrong!

There's no code in GCC for must-initialized.  Pierre's patch gets it right (except that I'm not sure about the comment at line 2033).  The MIR name can indeed add to the confusion about may/must.  Perhaps valid registers (VR)?

Note that this is not the available expressions problem.  The confluence function is the same, but it operates on registers and not on expressions.  The gen and kill sets are different, too.

I wouldn't be surprised if you don't find it in the literature, since it's a relatively obvious extension of the usual dataflow problems.

I think this issue can also be fixed easily with an approximated solution, which is already used for example by combine's set_nonzero_bits_and_sign_copies.  If a register is undefined at the start of the function, you punt because the register has uninitialized uses.  This is suboptimal if you have cases like:

     int y, x;
     x = y;     /* uninitialized */
     ...
     y = 5;
     ...
     x = y;     /* initialized */

but they should be pretty rare.

Comment 20 Kenneth Zadeck 2015-09-17 12:51:56 UTC

>> On second thoughts, for the first point, maybe a native speaker understands 
>> "an available definition on any path" as "an available definition on one path,
>> whatever it is", in which case the description would be correct, albeit very 
>> confusing for non-native speakers.  It would be good to have some insight from
>> the DF maintainers here (CCed).

A native English speaker would say that this could be written better.   I would suggest the term "on at least one path" and this is what a MAY problem is generally defined as.

I do have to say that I am still uncomfortable with changing RRE to use a MUST problem rather than a MAY problem.   I see this as dumbing down the compiler to provide the semantics of uninitialized variables and it is a path that we have generally avoided in GCC. I do not have a better solution, but there is a feeling that something is being missed here.

Comment 21 Eric Botcazou 2015-09-17 12:53:51 UTC

> LIVE provides live registers that MAY be initialized (are initialized on at
> least one path).  The comments are all wrong!
> 
> There's no code in GCC for must-initialized.  Pierre's patch gets it right
> (except that I'm not sure about the comment at line 2033).  The MIR name can
> indeed add to the confusion about may/must.  Perhaps valid registers (VR)?

Thanks for the comment.  Yes, I agree that the MIR name ought to be improved.

> I think this issue can also be fixed easily with an approximated solution,
> which is already used for example by combine's
> set_nonzero_bits_and_sign_copies.  If a register is undefined at the start
> of the function, you punt because the register has uninitialized uses.  This
> is suboptimal if you have cases like:
> 
>      int y, x;
>      x = y;     /* uninitialized */
>      ...
>      y = 5;
>      ...
>      x = y;     /* initialized */
> 
> but they should be pretty rare.

Right, we pondered about it.  The difference is that combine operates on pseudo while REE operates on hard registers, so the suboptimality could be large.

Comment 22 Eric Botcazou 2015-09-17 13:42:31 UTC

> A native English speaker would say that this could be written better.   I
> would suggest the term "on at least one path" and this is what a MAY problem
> is generally defined as.

Thanks.  Here's a proposed amended wording:

Index: df-problems.c
===================================================================
--- df-problems.c	(revision 227819)
+++ df-problems.c	(working copy)
@@ -1309,22 +1309,23 @@ df_lr_verify_transfer_functions (void)
 
 
 /*----------------------------------------------------------------------------
-   LIVE AND MUST-INITIALIZED REGISTERS.
+   LIVE AND MAY-INITIALIZED REGISTERS.
 
    This problem first computes the IN and OUT bitvectors for the
-   must-initialized registers problems, which is a forward problem.
-   It gives the set of registers for which we MUST have an available
-   definition on any path from the entry block to the entry/exit of
-   a basic block.  Sets generate a definition, while clobbers kill
+   may-initialized registers problems, which is a forward problem.
+   It gives the set of registers for which we MAY have an available
+   definition, i.e. for which there is an available definition on
+   at least one path from the entry block to the entry/exit of a
+   basic block.  Sets generate a definition, while clobbers kill
    a definition.
 
    In and out bitvectors are built for each basic block and are indexed by
    regnum (see df.h for details).  In and out bitvectors in struct
-   df_live_bb_info actually refers to the must-initialized problem;
+   df_live_bb_info actually refers to the may-initialized problem;
 
    Then, the in and out sets for the LIVE problem itself are computed.
    These are the logical AND of the IN and OUT sets from the LR problem
-   and the must-initialized problem.
+   and the may-initialized problem.
 ----------------------------------------------------------------------------*/
 
 /* Private data used to verify the solution for this problem.  */
@@ -1531,7 +1532,7 @@ df_live_confluence_n (edge e)
 }
 
 
-/* Transfer function for the forwards must-initialized problem.  */
+/* Transfer function for the forwards may-initialized problem.  */
 
 static bool
 df_live_transfer_function (int bb_index)
@@ -1555,7 +1556,7 @@ df_live_transfer_function (int bb_index)
 }
 
 
-/* And the LR info with the must-initialized registers, to produce the LIVE info.  */
+/* And the LR info with the may-initialized registers, to produce the LIVE info.  */
 
 static void
 df_live_finalize (bitmap all_blocks)

Comment 23 Kenneth Zadeck 2015-09-17 14:41:31 UTC

This change to the doc looks fine to me.

Comment 24 Eric Botcazou 2015-09-17 15:35:56 UTC

Author: ebotcazou
Date: Thu Sep 17 15:35:24 2015
New Revision: 227874

URL: https://gcc.gnu.org/viewcvs?rev=227874&root=gcc&view=rev
Log:
	PR rtl-optimization/66790
	* df-problems.c (LIVE): Amend documentation.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/df-problems.c

Comment 25 Eric Botcazou 2015-09-17 15:36:14 UTC

Author: ebotcazou
Date: Thu Sep 17 15:35:43 2015
New Revision: 227875

URL: https://gcc.gnu.org/viewcvs?rev=227875&root=gcc&view=rev
Log:
	PR rtl-optimization/66790
	* df-problems.c (LIVE): Amend documentation.

Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/df-problems.c

Comment 26 Eric Botcazou 2015-09-17 15:36:31 UTC

Author: ebotcazou
Date: Thu Sep 17 15:35:58 2015
New Revision: 227876

URL: https://gcc.gnu.org/viewcvs?rev=227876&root=gcc&view=rev
Log:
	PR rtl-optimization/66790
	* df-problems.c (LIVE): Amend documentation.

Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/df-problems.c

Comment 27 Pierre-Marie de Rodat 2015-09-21 16:32:29 UTC

Firstly, thanks everyone for your help! I'll try to address points that
still are unresolved.



(In reply to Bernd Schmidt from comment #14)
> As you say, the df-live problem claims to compute a must-initialized
> problem, so it would be very odd to add another problem with the same name.
(In reply to Eric Botcazou from comment #21)
> (In reply to Paolo Bonzini from comment #19)
>> There's no code in GCC for must-initialized.  Pierre's patch gets it right
>> (except that I'm not sure about the comment at line 2033).  The MIR name can
>> indeed add to the confusion about may/must.  Perhaps valid registers (VR)?
> 
> Thanks for the comment.  Yes, I agree that the MIR name ought to be improved.
To me, "valid" register has the same meaning as "initialized" and thus
does not carry the information about whether we can assume it is either
always or sometimes valid/initialized. So what about "always initialized
registers"? The acronym would be AIR.



(In reply to Bernd Schmidt from comment #14)
> I experimented with changing df_live_confluence_n to use and instead of ior,
> followed by actually taking the confluence_n from your mir problem. However,
> this does not work: ior enlarges the set while and shrinks it, so the
> initialization of the bitmaps would have to be different.
Yes: they both compute different information as discussed later in this
thread.



(In reply to Bernd Schmidt from comment #14)
> It looks like your "visited" bit tries to avoid that, but I don't
> think this works.
(In reply to Paolo Bonzini from comment #19)
> There's no code in GCC for must-initialized.  Pierre's patch gets it right
> (except that I'm not sure about the comment at line 2033).
The "visited" bit I introduced is used to fake an "all bits set"
initialization. The confluence_n method is meant to do:

    IN(dest) = OUT(src_0) AND OUT(src_1) AND ...

... by small increments of:

    IN(dest) := IN(dest) AND OUT(src_i)

When it is executed for the first time, IN(dest) is all-zeros, so we do
a copy because "a AND all_ones = a". Then we can do regular AND later
on. In order to get rid of this "visited" bit, I guess we could instead
pretend that IN/OUT sets are set of never-or-may-initialized registers,
but this complicates other parts of the code, so...

Paolo, does this clarify the comment at line 2033? (I will update it if
you think this is enough :-))



(In reply to Bernd Schmidt from comment #14)
> One testcase I looked at had this structure:
> 
> entry -> 2 -> 3 -> 3
> 
> i.e. a loop in  block 3. The confluence_n function takes the out bitmap from
> block 3 (which is all zeros) and ands it with the in bitmap from the same
> block, which has the incoming regs from block 2 at that point, but obviously
> the bitmap turns into all zeros at this point (block 3 was already visited
> due to the 2->3 edge). I don't see how your patch doesn't suffer from the
> same problem.
I don't understand what the problem is: the confluence_n function is
used during forward propagation. So in your example:

 a. the confluence_n function will be called for the 3 -> edge only
    after...

 b. the transfer function is invoked for the basic block 3 (then OUT(3)
    is initialized and potentially not all-zeros); this will be invoked
    only after...

 c. the confluence_n function is invoked for the 2 -> 3 edge
    (then IN(3) is initialized and potentially not all-zeros).

If I understand correctly, the BB_LAST_CHANGE_AGE mechanism in df-core.h
makes sure computations comply with this dependency (c. before b., b.
before a.).  So at the point the confluence_n function for the 3 -> 3
edges is called for the first time, both IN(3) and OUT(3) can be not
all-zeros.



(In reply to Bernd Schmidt from comment #14)
> I do have to say that I am still uncomfortable with changing RRE to
> use a MUST problem rather than a MAY problem.   I see this as dumbing
> down the compiler to provide the semantics of uninitialized variables
> and it is a path that we have generally avoided in GCC. I do not have
> a better solution, but there is a feeling that something is being
> missed here.
Indeed, while trying to fix the bug, I noticed that no RTL pass needs
this "is initialized on all paths" information.

I'm not sure what "having no semantics for uninitialized variables"
really implies: does this mean that we have an undefined behavior as
soon as code uses the value of an uninitialized variable? If not, I
think what makes REE special regarding this is that zero-extension is
used to pack independent values into single storage units (registers).
Because of this, considering that the whole storage unit is
uninitialized because a single embedded value is uninitialized makes us
generate wrong code.



Now on a different topic, I forgot to mention with my first candidate
patch: LIVE and MIR are supposed to work on the same GEN/KILL sets.
These respectively represent the set of registers that are
initialized/clobbered in the basic block. As you might have noticed,
LIVE and MIR don't have the same bb_local_compute function, though.

While getting familiar with DF problems, I noticed that LIVE's ignores
the order of GENs and KILLs in basic blocks. In other words, the
transfer function for: GEN(r1); KILL(r1) is currently the same as for
KILL(r1); GEN(r1) whereas clearly the former leaves r1 unitialized
whereas the latter leaves it initialized.

In order to get this right for MIR, I stated that MIR's GEN set excludes
the registers left killed at the end of the basic block (and by the way,
the bb_local_compute should process artificial defs with DF_REF_AT_TOP
before anything else). It seems that RD's and MD's
local_compute_process_def do this properly by clearing GEN's bit when
setting the corresponding KILL's and reciprocally (and they handle
artificial defs properly too!).

I think we should do the same for LIVE but lacks confidence because I
have not heard of any bug due to this so far... What do you think?

Comment 28 Bernd Schmidt 2015-09-21 17:04:30 UTC

(In reply to Pierre-Marie de Rodat from comment #27)
> (In reply to Bernd Schmidt from comment #14)
> > It looks like your "visited" bit tries to avoid that, but I don't
> > think this works.
> (In reply to Paolo Bonzini from comment #19)
> > There's no code in GCC for must-initialized.  Pierre's patch gets it right
> > (except that I'm not sure about the comment at line 2033).
> The "visited" bit I introduced is used to fake an "all bits set"
> initialization. The confluence_n method is meant to do:
> 
>     IN(dest) = OUT(src_0) AND OUT(src_1) AND ...
> 
> ... by small increments of:
> 
>     IN(dest) := IN(dest) AND OUT(src_i)
> 
> When it is executed for the first time, IN(dest) is all-zeros, so we do
> a copy because "a AND all_ones = a". Then we can do regular AND later
> on. In order to get rid of this "visited" bit, I guess we could instead
> pretend that IN/OUT sets are set of never-or-may-initialized registers,
> but this complicates other parts of the code, so...

I know this is what it's trying to do, but when I experimented with it that wasn't working, due to the problem I described.

> If I understand correctly, the BB_LAST_CHANGE_AGE mechanism in df-core.h
> makes sure computations comply with this dependency (c. before b., b.
> before a.).  So at the point the confluence_n function for the 3 -> 3
> edges is called for the first time, both IN(3) and OUT(3) can be not
> all-zeros.

It is sufficient for OUT(3) to be all-zeros. And I don't think the LAST_CHANGE_AGE mechanism does anything to prevent it. Please try it out. I think you have to initialize your bitmaps correctly rather than rely on "visited".

I applied your patch and compiled the following with a breakpoint on df_mir_confluence_n:
int foo (int n)
{
  int sum = 0;
  for (int i = 0; i < n; i++)
    sum += i;
  return sum;
}

and when it reaches the loop:

Breakpoint 5, df_mir_confluence_n (e=<edge 0x7ffff07352d8 (4 -> 4)>) at ../../trunk/gcc/df-problems.c:2040
(gdb) p debug_bitmap (op1)

first = 0x24bf418 current = 0x24bf418 indx = 0
	0x24bf418 next = (nil) prev = (nil) indx = 0
		bits = { 0 1 2 4 5 7 17 21 22 23 24 25
			 26 27 28 37 38 }
$2 = void
(gdb) p debug_bitmap (op2)

first = (nil) current = (nil) indx = 0

and

(gdb) p bb_info->visited 
$5 = true

because the edge 3->4 was seen first.

> (In reply to Bernd Schmidt from comment #14)
> > I do have to say that I am still uncomfortable with changing RRE to

I did not write this.

Comment 29 Paolo Bonzini 2015-09-22 08:39:55 UTC

> While getting familiar with DF problems, I noticed that LIVE's ignores
> the order of GENs and KILLs in basic blocks. In other words, the
> transfer function for: GEN(r1); KILL(r1) is currently the same as for
> KILL(r1); GEN(r1) whereas clearly the former leaves r1 unitialized
> whereas the latter leaves it initialized.

GEN and KILL are not the same for LR and MIR.

1) Sets and clobbers are handled differently.  A set or clobber of r1 _kills_ liveness, while for MIR sets _generate_ "initialized-ness" and only clobbers kill it.

2) A use of r1 _creates_ liveness (which extends up, until it reaches the sets for all reaching definitions of that use), while uses of r1 are irrelevant for MIR.

3) For LR, GEN must override KILL if an insn has both a set and a use of r1.  For MIR, you cannot have a GEN (set) and a KILL (clobber) in the same insn.

BTW, are you sure that

+      if (DF_REF_FLAGS_IS_SET (def,
+			       DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+	/* All partial or conditional def
+	   seen are included in the gen set. */
+	bitmap_set_bit (gen, regno);

?  I would have thought they don't belong in any set, and on the contrary I would have thought that may-clobber definitions count as kills.

Comment 30 Eric Botcazou 2015-09-22 09:03:38 UTC

> GEN and KILL are not the same for LR and MIR.
> 
> 1) Sets and clobbers are handled differently.  A set or clobber of r1
> _kills_ liveness, while for MIR sets _generate_ "initialized-ness" and only
> clobbers kill it.
> 
> 2) A use of r1 _creates_ liveness (which extends up, until it reaches the
> sets for all reaching definitions of that use), while uses of r1 are
> irrelevant for MIR.
> 
> 3) For LR, GEN must override KILL if an insn has both a set and a use of r1.
> For MIR, you cannot have a GEN (set) and a KILL (clobber) in the same insn.

The issue is not LR vs MIR, it's the may-initialized part of LIVE vs MIR.

It seem to me that the GEN and KILL of the may-initialized part of LIVE should be the same as those of MIR, since the difference between the 2 problems is only at the global level.  Pierre-Marie's patch duplicates them, but ideally they should be factored and reused.

Comment 31 Paolo Bonzini 2015-09-22 09:14:28 UTC

Ah, I see now.  I think you're right that the DF_REF_MUST_CLOBBER case should also clear GEN in df_live_bb_local_compute.

However, regarding the "BTW" I am fairly sure now that df_live_bb_local_compute and the corresponding function for MIR should handle may-clobber and may-sets differently.  If you think of may-clobber and may-set as a diamond-shaped CFG:

             .                .
            / \              / \
           /   \            /   \
      clobber   |          set   |
          \    /            \    /
           \  /              \  /
            \/                \/

Then at the join point you have an "OR" for LIVE (so the clobber's KILL disappears and the set's GEN remains), and an "AND" for MIR.  For MIR the clobber's KILL remains and the set's GEN disappears.

Comment 32 Pierre-Marie de Rodat 2015-09-23 10:04:10 UTC

(In reply to Bernd Schmidt from comment #28)
> It is sufficient for OUT(3) to be all-zeros. And I don't think the
> LAST_CHANGE_AGE mechanism does anything to prevent it. Please try it
> out. I think you have to initialize your bitmaps correctly rather than
> rely on "visited".
Thank you very much for the reproducer: I confirm that the
LAST_CHANGE_AGE mechanism does not have the effect I thought. I just
updated my patch locally to initialize bitmaps to 1 for all registers
and thus remove the visited field: this fixes the issue you found.

> > (In reply to Bernd Schmidt from comment #14)
> > > I do have to say that I am still uncomfortable with changing RRE to
> 
> I did not write this.
Indeed: Kenneth Zadeck said this in comment #20. I made a pasto, sorry
for that!



(In reply to Paolo Bonzini from comment #29)
> BTW, are you sure that
> 
> +      if (DF_REF_FLAGS_IS_SET (def,
> +                  DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> +   /* All partial or conditional def
> +      seen are included in the gen set. */
> +   bitmap_set_bit (gen, regno);
> 
> ?  I would have thought they don't belong in any set, and on the
> contrary I would have thought that may-clobber definitions count as
> kills.
For partial and conditional defs in the context of MIR:

  * if the register was initialized, then it is still initialized
    afterwards, whatever happens;

  * if the register was uninitialized, then in the case of partial def
    there will still be bits uninitialized and in the case of
    conditional def it is possible that the instruction leaves the
    register uninitialized: in both case there is a possibility to leave
    part of the register uninitialized.

So I would agree with: they don't belong in any set.

While thinking about this, I also realized with REE in mind that since
we need a conservative computation for MIR, we must set KILL/clear GEN
for register refs with DF_REF_MAY_CLOBBER: it may leave the register
uninitialized.

(In reply to Paolo Bonzini from comment #31)
> Ah, I see now.  I think you're right that the DF_REF_MUST_CLOBBER case
> should also clear GEN in df_live_bb_local_compute.
> 
> However, regarding the "BTW" I am fairly sure now that
> df_live_bb_local_compute and the corresponding function for MIR should
> handle may-clobber and may-sets differently.  If you think of
> may-clobber and may-set as a diamond-shaped CFG:
> 
> […]
> 
> Then at the join point you have an "OR" for LIVE (so the clobber's
> KILL disappears and the set's GEN remains), and an "AND" for MIR.  For
> MIR the clobber's KILL remains and the set's GEN disappears.
Agreed, thank you! I have updated both MIR and LIVE in the light of
this. (bootstrapped and regtested on x86_64-linux)

Comment 33 Pierre-Marie de Rodat 2015-09-23 10:04:55 UTC

Created attachment 36377 [details]
Updated candidate patch

Comment 34 Pierre-Marie de Rodat 2015-09-23 10:05:33 UTC

Created attachment 36378 [details]
Fix for DF_LIVE local BB information

Comment 35 Paolo Bonzini 2015-09-23 10:23:14 UTC

Comment on attachment 36377 [details]
Updated candidate patch

> +     This problem determines which registers may be uninitialized. It first
> +     assumes these are all initialized and then it eliminates the ones reached
> +     by paths without crossing a definition.  The IN bitmap is clear at first
> +     (i.e. all registers are assumed not to be initialized) so don't consider
> +     its value the first time.  */
> +  return bitmap_and_into (op1, op2);

Is this comment obsolete?  The IN bitmap is all set at first.  Otherwise looks good.

Comment 36 Bernd Schmidt 2015-09-23 10:24:54 UTC

This looks better. I still don't quite understand why you're treating MUST_CLOBBER and MAY_CLOBBER defs differently in simulate. It looks like a MUST_CLOBBER produces a bit in gen which I think is not what is wanted.

Anything wrong with writing this simply as follows?

      if (DF_REF_FLAGS_IS_SET (def, DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))
	{
	  bitmap_set_bit (kill, regno);
	  bitmap_clear_bit (gen, regno);
	}
      /* In the worst case, partial and conditional defs can leave bits
	 uninitialized, so assume they do not change anything.  */
      else if (!DF_REF_FLAGS_IS_SET (def, DF_REF_PARTIAL | DF_REF_CONDITIONAL))
	{
	  bitmap_set_bit (gen, regno);
	  bitmap_clear_bit (kill, regno);
	}

And, not as a requirement for your patch, but as a point for discussion - do we want a special all_ones_bitmap that doesn't take up memory for purposes like this? It would add two additional tests to each bitmap_{and,ior} operation.

Comment 37 Paolo Bonzini 2015-09-23 10:26:42 UTC

Bernd is right that you have a missing 'else'.

Comment 38 Pierre-Marie de Rodat 2015-09-24 07:42:40 UTC

(In reply to Paolo Bonzini from comment #35)
> Is this comment obsolete?  The IN bitmap is all set at first.
Yes, indeed: I removed the part you quoted.

(In reply to Paolo Bonzini from comment #35)
> Otherwise looks good.
Thank you! Does this include the DF_LIVE fix?


(In reply to Bernd Schmidt from comment #36)
> This looks better. I still don't quite understand why you're treating
> MUST_CLOBBER and MAY_CLOBBER defs differently in simulate. It looks
> like a MUST_CLOBBER produces a bit in gen which I think is not what is
> wanted.
> 
> Anything wrong with writing this simply as follows?
(In reply to Paolo Bonzini from comment #37)
> Bernd is right that you have a missing 'else'.
Pastos strike back! I updated df_mir_simulate_one_insn according to your
suggestion: thank you.

Besides, Eric told me live that instead of artificially clearing the
entry basic block’s IN set, we could use the confluence_0 for that. I
did this and took this opportunity to minimize code duplication: the
df_mir_init function now just calls df_mir_reset.

> And, not as a requirement for your patch, but as a point for
> discussion - do we want a special all_ones_bitmap that doesn't take up
> memory for purposes like this? It would add two additional tests to
> each bitmap_{and,ior} operation.

Good point, although what would the following yield, then?

    bitmap_clear (res);
    bitmap_set_bit (res, 1);
    bitmap_and_into (res, all_ones_bitmap);

I mean: we would need to represent bitmaps will all bits set but one
(and by extension: all but several).

Anyway, here's the updated patch, bootstrapped and regtested again on
x86_64-linux (plus a new testcase in gnat.dg). By the way, there's still
the MIR name, as we did not reach a consensus about a better name.

Comment 39 Pierre-Marie de Rodat 2015-09-24 07:43:15 UTC

Created attachment 36384 [details]
Updated candidate patch

Comment 40 Bernd Schmidt 2015-09-24 17:57:54 UTC

(In reply to Pierre-Marie de Rodat from comment #38)
> Besides, Eric told me live that instead of artificially clearing the
> entry basic block’s IN set, we could use the confluence_0 for that. I
> did this and took this opportunity to minimize code duplication: the
> df_mir_init function now just calls df_mir_reset.

Hmm, is that actually the right thing for the entry block? I notice that we simulate artificial defs, but it seems like that set is not conservatively correct. For a simple function taking a single integer argument:

;;  entry block defs     0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 7 [sp] 21 [xmm0] 22 [xmm1] 23 [xmm2] 24 [xmm3] 25 [xmm4] 26 [xmm5] 27 [xmm6] 28 [xmm7] 37 [r8] 38 [r9]

where it would appear that only regs 5 (DI) and 7 (SP) are actually reliably defined, but df_get_entry_block_def_set adds everything that's a FUNCTION_ARG_REGNO. But the entire set makes it through the mir in/out sets in the entire function.

(gdb) p df_dump(stdout)
[...]
; mir   in  	 0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 7 [sp] 21 [xmm0] 22 [xmm1] 23 [xmm2] 24 [xmm3] 25 [xmm4] 26 [xmm5] 27 [xmm6] 28 [xmm7] 37 [r8] 38 [r9]
[...]
;; mir   out 	 0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 7 [sp] 17 [flags] 21 [xmm0] 22 [xmm1] 23 [xmm2] 24 [xmm3] 25 [xmm4] 26 [xmm5] 27 [xmm6] 28 [xmm7] 37 [r8] 38 [r9]


Maybe it's best _not_ to include artificial defs, except for a subset of things that are known good - such as the stack pointer.

Apart from that, starting to look good, thanks for the effort. Future versions should probably go to gcc-patches.

Comment 41 Pierre-Marie de Rodat 2015-10-01 05:44:41 UTC

Thank you again for spotting this! Yes, these artificial defs break the
consevativeness for the MIR analysis. I guess your proposal would work:
considering as uninintialized registers than aren’t is conservative
although it will prevent more optimizations that one could expect in
REE. I don’t know if it’s a problem in practice but your example shows
that we have a lot of registers in this situation.

At this point, though, I wonder why artificial defs do not describe what
actually happens for this function. Shouldn’t we instead fix DF to emit
artificial defs only for actual input registers? I’ll investigate this.

… and then I will followup on gcc-patches as you suggested. Once more,
thank you all for your help in this! :-)

Comment 42 Bernd Schmidt 2015-10-03 10:45:36 UTC

I don't think that would actually help. Even if something is an actual incoming argument register, it may still be uninitialized by the caller. That's true for plain scalars, but even more likely for something like the partial struct initialization in comment #8.

Comment 43 Pierre-Marie de Rodat 2015-10-12 14:47:50 UTC

(In reply to Bernd Schmidt from comment #42)
> I don't think that would actually help. Even if something is an actual
> incoming argument register, it may still be uninitialized by the caller.
Sure, but my understanding is that what matters in DF is what happens inside the function, so we can assume arguments are always initialized.

Actually this makes sense in the context of REE as well: we have one (artificial) def for arguments so since this def has no corresponding instruction, REE will not be able to remove z-extensions reaching this def.

That being said, I failed to get the “is it really used for incoming arguments” hard-reg predicate, so falling back to your proposal: I will followup on gcc-patches@.

Thanks! :-)

Comment 44 pmderodat 2015-10-19 23:48:07 UTC

Author: pmderodat
Date: Mon Oct 19 23:47:35 2015
New Revision: 229008

URL: https://gcc.gnu.org/viewcvs?rev=229008&root=gcc&view=rev
Log:
REE: fix uninitialized registers handling

gcc/ChangeLog:

	PR rtl-optimization/66790
	* df.h (DF_MIR): New macro.
	(DF_LAST_PROBLEM_PLUS1): Update to be past DF_MIR
	(DF_MIR_INFO_BB): New macro.
	(DF_MIR_IN, DF_MIR_OUT): New macros.
	(struct df_mir_bb_info): New.
	(df_mir): New macro.
	(df_mir_add_problem, df_mir_simulate_one_insn): New forward
	declarations.
	(df_mir_get_bb_info): New.
	* df-problems.c (struct df_mir_problem_data): New.
	(df_mir_free_bb_info, df_mir_alloc, df_mir_reset,
	df_mir_bb_local_compute, df_mir_local_compute, df_mir_init,
	df_mir_confluence_0, df_mir_confluence_n,
	df_mir_transfer_function, df_mir_free, df_mir_top_dump,
	df_mir_bottom_dump, df_mir_verify_solution_start,
	df_mir_verify_solution_end): New.
	(problem_MIR): New.
	(df_mir_add_problem, df_mir_simulate_one_insn): New.
	* timevar.def (TV_DF_MIR): New.
	* ree.c: Include bitmap.h
	(add_removable_extension): Add an INIT_REGS parameter.  Use it
	to skip zero-extensions that may get an uninitialized register.
	(find_removable_extensions): Compute must-initialized registers
	using the MIR dataflow problem. Update the call to
	add_removable_extension.
	(find_and_remove_re): Call df_mir_add_problem.

gcc/testsuite/ChangeLog:

	* gnat.dg/opt50.adb: New test.
	* gnat.dg/opt50_pkg.adb: New helper.
	* gnat.dg/opt50_pkg.ads: New helper.

Added:
    trunk/gcc/testsuite/gnat.dg/opt50.adb
    trunk/gcc/testsuite/gnat.dg/opt50_pkg.adb
    trunk/gcc/testsuite/gnat.dg/opt50_pkg.ads
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/ree.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/timevar.def

Comment 45 Eric Botcazou 2015-12-19 22:36:12 UTC

Fixed in GCC 6.