Bug 35498 - libgomp/testsuite/libgomp.c/atomic-3.c fails on ppc-linux
Summary: libgomp/testsuite/libgomp.c/atomic-3.c fails on ppc-linux
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: 4.3.0
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-07 15:17 UTC by Jakub Jelinek
Modified: 2008-09-28 22:33 UTC (History)
4 users (show)

See Also:
Host:
Target: ppc-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-03-07 16:33:42


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2008-03-07 15:17:31 UTC
/* { dg-do run } */
/* { dg-options "-fopenmp -O2" } */

extern int omp_get_num_threads (void);
extern void abort (void);

short e[64];
int num_threads;
int g;
_Complex double d, f;

__attribute__((noinline)) void
foo (int x, long long y)
{
#pragma omp parallel num_threads (4)
  {
    int i;
    #pragma omp barrier
    for (i = 0; i < 2400; i++)
      {
        if (i == 0)
          num_threads = omp_get_num_threads ();
        #pragma omp atomic
          e[0] += x;
      }
  }
}

int
main (void)
{
  int i;
  foo (3, 3LL);
  if (e[0] != 3 * 2400 * num_threads)
    abort ();
  return 0;
}

as well as atomic-3.c fails on ppc-linux.
In *.optimized dump the atomic operation is:
  D.1288 = (short unsigned int) .omp_data_i->x;
  D.1290 = e[0];

<bb 6>:
  D.1294 = __sync_val_compare_and_swap_2 (&e, D.1290, (short int) ((short unsigned int) D.1290 + D.1288));
  D.1295 = D.1290;
  D.1290 = D.1294;
  if (D.1294 != D.1295)
    goto <bb 6>;
  else
    goto <bb 7>;
which looks correct.
Comment 1 Jakub Jelinek 2008-03-07 16:33:42 UTC
Patch:
2008-03-07  Jakub Jelinek  <jakub@redhat.com>

        PR target/35498
        * config/rs6000/rs6000.c (rs6000_expand_compare_and_swapqhi): Shift
        wdst back after sync_compare_and_swapqhi_internal.

--- gcc/config/rs6000/rs6000.c.jj       2008-02-29 09:11:54.000000000 +0100
+++ gcc/config/rs6000/rs6000.c  2008-03-07 17:22:27.000000000 +0100
@@ -13858,6 +13858,9 @@ rs6000_expand_compare_and_swapqhi (rtx d
   emit_insn (gen_sync_compare_and_swapqhi_internal (wdst, mask,
                                                    oldval, newval, mem));
 
+  /* Shift the result back.  */
+  emit_insn (gen_lshrsi3 (wdst, wdst, shift));
+
   emit_move_insn (dst, gen_lowpart (mode, wdst));
 }
 

will test it now and if successfull, post.
Comment 2 Jakub Jelinek 2008-03-08 07:31:40 UTC
Subject: Bug 35498

Author: jakub
Date: Sat Mar  8 07:30:55 2008
New Revision: 133024

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133024
Log:
	PR target/35498
	* config/rs6000/rs6000.c (rs6000_expand_compare_and_swapqhi): Shift
	wdst back after sync_compare_and_swapqhi_internal.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.c

Comment 3 Jakub Jelinek 2008-03-08 07:37:19 UTC
Subject: Bug 35498

Author: jakub
Date: Sat Mar  8 07:36:35 2008
New Revision: 133025

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133025
Log:
	PR target/35498
	* config/rs6000/rs6000.c (rs6000_expand_compare_and_swapqhi): Shift
	wdst back after sync_compare_and_swapqhi_internal.

Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/config/rs6000/rs6000.c

Comment 4 Jakub Jelinek 2008-03-08 07:48:44 UTC
The reason why the old code without the right shift almost worked is that
for the 4 byte aligned 16-bit vars each loop was executed usually twice.
.L6:
        lha 0,0(27)
        lhz 8,2(26)
        .align 4
.L4:
        sync
        add 9,8,0
        rlwinm 10,0,0,0xffff
        rlwinm 9,9,0,0xffff
        slw 11,10,31
        slw 9,9,31
.L11:
        lwarx 7,0,29
        and 0,7,28
        cmpw 0,0,11
        bne- 0,.L12
        andc 7,7,28
        or 7,7,9
        stwcx. 7,0,29
        bne- 0,.L11
        isync
.L12:
!       srw 0,0,31      ! This insn was added by this patch
        rlwinm 0,0,0,0xffff
        cmpw 7,0,10
        extsh 0,0
        bne 7,.L4
The first time usually the atomic instruction succeeded, but r0 after rlwinm was 0, so most often different from r10.  This means the code then jumped to .L4, with r0 = 0 as the expected value of e[0]. r10 then becomes 0 as new expected value, lwarx reads the new actual value of e[0], which will be different from
the expected 0.  So it jumps to .L12, r0 now contains the e[0] value in upper half and 0 in lower half and r10 is 0, so in the second big loop nothing is changed and the loop exits.  This is what happens if there is no contention.  If there is contention though, the first loop doesn't compare and swap anything and 
as shown above, the second loop iteration won't change anything unless e[0] is 0.