Bug 15857 - [3.3 Regression] Wrong code with optimization >= -O1
Summary: [3.3 Regression] Wrong code with optimization >= -O1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.4.0
: P2 critical
Target Milestone: 3.3.5
Assignee: Not yet assigned to anyone
URL:
Keywords: monitored, wrong-code
Depends on:
Blocks:
 
Reported: 2004-06-07 11:30 UTC by marco
Modified: 2004-09-13 08:44 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.1 3.4.2
Known to fail: 3.2 3.3.4 3.4.0 3.4.1
Last reconfirmed: 2004-06-07 12:48:37


Attachments
small testcase (349 bytes) (229 bytes, text/plain)
2004-06-07 12:39 UTC, Serge Belyshev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description marco 2004-06-07 11:30:57 UTC
The included source file "fail_alpha.cc" implements alpha blending on i386/mmx
platforms using the mmx intrinsics from <mmintrin.h>. If compiled without
optimizations (but with -march=pentium2 or -march=athlon to enable the mmx
intrinsics) the program terminates, however if compiled with any optimization
level (-O1, -O2, -O3) the program does not terminate. 

Example: 
 g++ -march=athlon fail_alpha.cc -o fail_alpha
 ./fail_alpha 
 => teminates

 g++ -O1 -march=athlon fail_alpha.cc -o fail_alpha
 ./fail_alpha 
 => hangs

I've also checked the 3.3.2 compiler and found that he shows the same behaviour.

Here is my config line:
% /opt/gcc-3.4.0/bin/g++ -v
Reading specs from /opt/gcc-3.4.0/lib/gcc/i386-slackware-linux/3.4.0/specs
Configured with: ../source/gcc-3.4.0/configure --prefix=/opt/gcc-3.4.0
--enable-shared --enable-threads=posix --enable-__cxa_atexit --disable-checking
--with-gnu-ld --verbose --target=i386-slackware-linux --host=i386-slackware-linux
Thread model: posix
gcc version 3.4.0

/*--- begin fail_alpha.cc ---*/
#include <mmintrin.h>

typedef unsigned char byte;

struct Surface
{
  byte *pixels; 
  unsigned int pitch; 
  unsigned int w,h; 
  unsigned int bytes_per_pixel; 
}; 

struct Rect
{
  int x,y,w,h; 
}; 

struct calpha_blender
{
  __m64 valpha; 
  __m64 one; 
  byte alpha; 
  calpha_blender(byte _alpha) : alpha(_alpha) 
  {
    valpha=_mm_set1_pi16(alpha); 
    one=_mm_set1_pi16(1); 
  }
  virtual  __m64 blend(__m64 s, __m64 d)
  {
    __m64 lo=_mm_setzero_si64(); 
    __m64 hi=_mm_setzero_si64(); 
    
    lo=_mm_sub_pi16(_mm_unpacklo_pi8(s,lo),_mm_unpacklo_pi8(d,lo));
    lo=_mm_mullo_pi16(lo,valpha); 
    
    hi=_mm_sub_pi16(_mm_unpackhi_pi8(s,hi),_mm_unpackhi_pi8(d,hi));
    hi=_mm_mullo_pi16(hi,valpha); 
    
    lo=_mm_add_pi16(lo,one); 
    hi=_mm_add_pi16(hi,one); 
    
    lo=_mm_add_pi16(lo,_mm_srli_pi16(lo,8));
    hi=_mm_add_pi16(hi,_mm_srli_pi16(hi,8));
    lo=_mm_srli_pi16(lo,8); 
    hi=_mm_srli_pi16(hi,8); 
    
    hi=_mm_packs_pu16(lo,hi);
    hi=_mm_add_pi8(d,hi); 
    return hi;     
  }
  inline byte blend(byte s, byte d) 
  {
    unsigned int c=alpha*(s-d)+1; 
    return (c+(c>>8)>>8)+d; 
  }

}; 

void blt(Surface &s,const Rect &sr,Surface &d,const Rect &dr, byte alpha)
{
  byte *sp=s.pixels; 
  byte *dp=d.pixels; 
  int bpp=s.bytes_per_pixel; 
  sp=sp+(sr.x*bpp+sr.y*s.pitch); 
  dp=dp+(dr.x*bpp+dr.y*d.pitch); 
  int lw=sr.w*bpp; 

  calpha_blender blender(alpha); 

  for (int y=0;y<sr.h;y++) 
    {
      int x=0;

      for (;x<lw;x+=8) 
	{
	  *(__m64*)(dp+x)=blender.blend(*(__m64*)(sp+x),*(__m64*)(dp+x));
	}
      for (;x<lw;x++) 
	{
	  dp[x]=blender.blend(sp[x],dp[x]); 
	}
      sp+=s.pitch; 
      dp+=d.pitch; 
    }
  _mm_empty(); 
}

int main() 
{
  const int W=800; 
  const int H=600;
  const int bpp=3; 
  byte data[W*H*bpp]; 
  Surface x={data,W*bpp,W,H,bpp}; 
  Rect r={0,0,W,H}; 
  blt(x,r,x,r,128); 
}

/*--- end fail_alpha.cc ---*/
Comment 1 Serge Belyshev 2004-06-07 12:39:57 UTC
Created attachment 6486 [details]
small testcase (349 bytes)
Comment 2 Serge Belyshev 2004-06-07 12:48:37 UTC
Confirmed.
Comment 3 Volker Reichelt 2004-06-07 13:07:54 UTC
Hi Mark,
do you want to retarget this PR to 3.4.1 since it's a wrong-code bug?
Comment 4 Volker Reichelt 2004-08-26 08:39:12 UTC
This is fixed by Jason's patch for PR 15461 and PR 16851:
http://gcc.gnu.org/ml/gcc-cvs/2004-08/msg01287.html

I suspect it's the part for PR 15461.
I'll check whether this also fixes the problem on the 3.3 branch.
Comment 5 Volker Reichelt 2004-08-26 08:48:43 UTC
Yup, the patch to semantics.c indeed fixes the problem on the 3.3 branch.
I didn't do any regression tests though.
Comment 6 Gabriel Dos Reis 2004-08-26 13:28:57 UTC
Subject: Re:  [3.3 Regression] Wrong code with optimization >= -O1

"reichelt at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| Yup, the patch to semantics.c indeed fixes the problem on the 3.3 branch.
| I didn't do any regression tests though.

Thanks I'll check that very late today.

-- Gaby
Comment 7 CVS Commits 2004-09-13 08:39:34 UTC
Subject: Bug 15857

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_3-branch
Changes by:	gdr@gcc.gnu.org	2004-09-13 08:39:17

Modified files:
	gcc/cp         : ChangeLog semantics.c 
Added files:
	gcc/testsuite/g++.dg/opt: nrv7.C 

Log message:
	PR c++/15857
	Backport from gcc-3_4-branch
	2004-08-24  Jason Merrill  <jason@redhat.com>
	PR c++/15461
	* semantics.c (nullify_returns_r): Replace a DECL_STMT
	for the NRV with an INIT_EXPR.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.3076.2.273&r2=1.3076.2.274
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/semantics.c.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.282.4.6&r2=1.282.4.7
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/nrv7.C.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=NONE&r2=1.1.10.1

Comment 8 Gabriel Dos Reis 2004-09-13 08:44:43 UTC
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01220.html