This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [trans-mem] optimize read for write
- From: Daniel Berlin <dberlin at dberlin dot org>
- To: Aldy Hernandez <aldyh at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, rth at redhat dot com
- Date: Tue, 29 Sep 2009 13:12:36 -0400
- Subject: Re: [trans-mem] optimize read for write
- References: <20090929162602.GA16198@redhat.com>
On Tue, Sep 29, 2009 at 12:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Hi Richard.
>
> The patch below calculates the ANTIC sets thus allowing us to perform
> the read-for-write optimization.
>
> In the code below, the first read is transformed into a read-for-write
> and the second is transformed into a write-after-write. ?I believe this
> is correct. ?Is this what you were expecting?
>
> ?D.2742_6 = _ITM_RU1 (&c);
> ?c.0_1 = (char) D.2742_6;
> ?c.1_2 = c.0_1 + 1;
> ?D.2743_7 = (unsigned char) c.1_2;
> ?_ITM_WU1 (&c, D.2743_7);
>
> As a bonus, I have cleaned up the AVAIL computation a bit. ?No need to
> iterate twice over the blocks during the initialization, if we can both
> initialize the worklist and seed the sets in one go. ?Also, no need to
> look at entry blocks since they'll never appear in the worklist.
>
> No regressions.
>
> OK for branch?
>
> In my next patch I'll do the actual transformation, and then onto bigger
> and better things.
>
> ? ? ? ?* trans-mem.c (tm_memopt_compute_antic): New.
> ? ? ? ?(tm_memopt_compute_available): Merge seed and initialization
> ? ? ? ?loops. ?Remove entry_block check.
> ? ? ? ?(tm_memopt_compute_antin): New.
> ? ? ? ?(execute_tm_memopt): Call tm_memopt_compute_antic.
>
> Index: testsuite/gcc.dg/tm/memopt-2.c
> ===================================================================
> --- testsuite/gcc.dg/tm/memopt-2.c ? ? ?(revision 0)
> +++ testsuite/gcc.dg/tm/memopt-2.c ? ? ?(revision 0)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgnu-tm -O -fdump-tree-tmmemopt" } */
> +
> +char c;
> +
> +f()
> +{
> + ?__tm_atomic {
> + ? ?++c;
> + ?}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "RfW.*RU1 \\(&c\\);" 1 "tmmemopt" } } */
> +/* { dg-final { scan-tree-dump-times "WaW.*WU1 \\(&c," 1 "tmmemopt" } } */
> Index: trans-mem.c
> ===================================================================
> --- trans-mem.c (revision 152270)
> +++ trans-mem.c (working copy)
> @@ -1820,6 +1820,29 @@ tm_memopt_compute_avin (struct tm_region
> ? ? }
> ?}
>
> +/* Compute the STORE_ANTIC_IN for the basic block BB in REGION. ?*/
> +
> +static void
> +tm_memopt_compute_antin (struct tm_region *region, basic_block bb)
> +{
> + ?edge e;
> + ?unsigned ix;
> +
> + ?/* Exit blocks have an ANTIC_IN of NULL. ?*/
> + ?if (bitmap_bit_p (region->exit_blocks, bb->index))
> + ? ?return;
> +
> + ?/* Seed with the ANTIC_OUT of any successor. ?*/
> + ?e = EDGE_SUCC (bb, 0);
> + ?bitmap_copy (STORE_ANTIC_IN (bb), STORE_ANTIC_OUT (e->dest));
> +
This is wrong.
It will produce a correct but non-maximal answer.
ANTIC is a backwards intersection problem.
As such the initialization must be all 1 bits in order to get a
maximal correct answer.
If you are using bitmaps, your options are basically to either set
all-1's (and ruin the sparseness), or do what we do now in tree level
PRE (defer visiting blocks where at least one successor has not been
visited until next iteration/only AND together the visited successors
when at least one has been visited).
This was a problem with GVN-PRE for years, and we have testcases in
the testsuite that show where you get missing optimizations if you
don't initialize to all-1's or do something like the above.