This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR tree-optimization/52558]: RFC: questions on store data race

From: Aldy Hernandez <aldyh at redhat dot com>
To: Richard Guenther <rguenther at suse dot de>
Cc: Andrew MacLeod <amacleod at redhat dot com>, "Boehm, Hans" <hans dot boehm at hp dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Torvald Riegel <triegel at redhat dot com>
Date: Mon, 07 May 2012 18:04:56 -0500
Subject: Re: [PR tree-optimization/52558]: RFC: questions on store data race
References: <4F875324.7060908@redhat.com> <CAFiYyc2Cc2AgL75miwrENHZVywCVuxtUrE1RdO_A4VsiDJg3=A@mail.gmail.com> <4F96E635.2040808@redhat.com> <CAFiYyc1dP_vqBmxKBJ+o45_GAnhkoD++20HTYNbT_V_HspypAw@mail.gmail.com> <4F9880C9.2070704@redhat.com> <Pine.LNX.4.64.1204261136110.23071@jbgna.fhfr.qr>

Hi. Sorry for the delay. There were various tricky hiccups along the way to bootstrappability and regression cleanliness...

On 04/26/12 04:51, Richard Guenther wrote:

On Wed, 25 Apr 2012, Aldy Hernandez wrote:
On 04/25/12 06:45, Richard Guenther wrote:
On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com> wrote:
On 04/13/12 03:46, Richard Guenther wrote:
On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
wrote:

Speak of loads, I am keeping the information as an additional bitmap in
`memory_accesses', as ->refs_in_loop was set for stores as well, so I couldn't
depend on it.  Let me know if you have another idea.

Hmm, refs_in_loop& ~all_refs_stored_in_loop, so instead of

+      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
+                               loop->num);
+      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);

ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&&  !bitmap_bit_p
(stores, ref->id);

?  But maybe that doesn't work if a ref is both read and stored to.
Btw, rather than adding a bitmap to memory_accesses I'd rather add
a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
both into one) and a bitmap to struct mem_ref.

I have done so as mark_ref_loaded(). I first tried to merge both into a generic mark_ref(), to avoid iterating twice, but I was concerned with the early loop exit of !bitmap_bit_p(ref->stored, loop->num), not knowing if I should exit the loop altogether in the merged case or what. In either case, the code was somewhat convoluted, so I opted for a separate clean function.

In scanning the gimple stream for the loads I noticed that instances of two component refs (say foo.bar) were not pointer comparable, so I am asking the alias oracle with refs_may_alias_p. It also seemed to make more sense wrt memory chunks that may alias. What do you think?

This patch is far more robust and fully tested. Two wrenches to complicate things though...

First, while inserting the code writing the _LSM_ temporary writes into the original variables, I found a handful of tests where the order of the writes mattered (aliased locations, inlined code, etc, IIRC). [This is in loops where multiple stores are moved.] So iterating and inserting on the exit edges caused the stores to be saved in reverse. I am now saving the edges and threading the code, so they happen in the correct order:

		if (foo_flag_lsm)
			foo = foo_lsm;
		if (foo2_flag_lsm)
			foo2 = foo2_lsm;
	actual_exit:

This required some edge redirection so we didn't jump to the original actual exit prematurely when handling multiple moved stores. The dominator tree needed to be updated, and there is one instance where I couldn't find a clever way of saving the order without jumping through an inordinate amount of hoops. It seemed easier to call recompute_dominator.

Second, avoiding the load, unless it happens in the loop like you have suggested, brought about some bogus unused warnings with the LSM temporaries. What happens is that compute_ssa() creates uses of the LSM temporaries that are not yet defined, so we end up with PHI nodes with a definition entry (D). Since we are sure the eventual uses will be gated by their corresponding flag, I created phi nodes with an initial value of 0 in the preheader of the loops in which they are used. This solved the problem. I also played with initializing to 0 at the entry block, but that seemed a bit too invasive.

Andrew suggested the correct fix was to add a new pass that was able to do some ?? flow sensitive data flow analysis ?? that could discover these unreachable paths and insert the 0 phis at the start of the blocks automatically. But that seemed like far too much work, considering how long it's taken me to get this far ;-).

Bootstraped and fully tested with multi_threaded_model_p=true hard-coded. This is the revision I'd like to contribute, sans the hardcoded value.

What do you think?

Attachment: curr
Description: Text document

Follow-Ups:
- Re: [PR tree-optimization/52558]: RFC: questions on store data race
  - From: Andrew MacLeod
- PING: Re: [PR tree-optimization/52558]: RFC: questions on store data race
  - From: Aldy Hernandez
- Re: [PR tree-optimization/52558]: RFC: questions on store data race
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]