47617 – SSE rounding mode works -g, not -O3

Bug 47617 - SSE rounding mode works -g, not -O3

Summary: SSE rounding mode works -g, not -O3

Status:	RESOLVED DUPLICATE of bug 34678

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.3.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-02-05 22:27 UTC by cck0011
Modified:	2011-02-12 18:20 UTC (History)
CC List:	0 users

See Also:
Host:
Target:	i?86-*-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-02-07 11:52:28

Attachments
generated .i file (12.70 KB, application/octet-stream) 2011-02-05 22:27 UTC, cck0011	Details
source file (2.52 KB, text/x-csrc) 2011-02-08 01:37 UTC, cck0011	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description cck0011 2011-02-05 22:27:34 UTC

Created attachment 23252 [details]
generated .i file

Hi folks,

  I'm working with SSE intrinsics and think I have a rounding problem. When I try to change modes with _MM_SET_ROUNDING_MODE, I see different results when compiled "-g", but not "-O3". 

  What am I missing?

thanks!

Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl
=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-che
cking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,
c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/us
r/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar
=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux
Thread model: posix
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)
COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic'
 /usr/libexec/gcc/i386-redhat-linux/4.3.2/cc1 -E -quiet -v round.c -msse -mmmx -mtune=generic -Wall -O3 -fp
ch-preprocess -o round.i
ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.2/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc/i386-redhat-linux/4.3.2/include
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic'
 /usr/libexec/gcc/i386-redhat-linux/4.3.2/cc1 -fpreprocessed round.i -quiet -dumpbase round.c -msse -mmmx -
mtune=generic -auxbase round -O3 -Wall -version -o round.s
GNU C (GCC) version 4.3.2 20081105 (Red Hat 4.3.2-7) (i386-redhat-linux)
        compiled by GNU C version 4.3.2 20081105 (Red Hat 4.3.2-7), GMP version 4.2.2, MPFR version 2.3.2.
GGC heuristics: --param ggc-min-expand=55 --param ggc-min-heapsize=48000
Compiler executable checksum: 3bee52601079f736b7b63b762646f4ba
round.c: In function ‘test_sse1_feature’:
round.c:150: warning: unused variable ‘sig’
round.c:150: warning: unused variable ‘extensions’
round.c:149: warning: ‘edx’ may be used uninitialized in this function
COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic'
 as -V -Qy -o round.o round.s
GNU assembler version 2.18.50.0.9 (i386-redhat-linux) using BFD version version 2.18.50.0.9-8.fc10 20080822
COMPILER_PATH=/usr/libexec/gcc/i386-redhat-linux/4.3.2/:/usr/libexec/gcc/i386-redhat-linux/4.3.2/:/usr/libe
xec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386-redhat-linux/:/usr/libex
ec/gcc/i386-redhat-linux/4.3.2/:/usr/libexec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/
usr/lib/gcc/i386-redhat-linux/
LIBRARY_PATH=/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386-
redhat-linux/4.3.2/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic'
 /usr/libexec/gcc/i386-redhat-linux/4.3.2/collect2 --eh-frame-hdr --build-id -m elf_i386 --hash-style=gnu -
dynamic-linker /lib/ld-linux.so.2 -o round /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../crt1.o /usr/lib/gc
c/i386-redhat-linux/4.3.2/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.3.2/crtbegin.o -L/usr/lib/gcc/i3
86-redhat-linux/4.3.2 -L/usr/lib/gcc/i386-redhat-linux/4.3.2 -L/usr/lib/gcc/i386-redhat-linux/4.3.2/../../.
. round.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gc
c/i386-redhat-linux/4.3.2/crtend.o /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../crtn.o

Comment 1 Andrew Pinski 2011-02-06 02:13:37 UTC

I think you need to use -frounding-math.  GCC assumes by default the rounding mode is round-to-nearest.  See http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Optimize-Options.html#index-frounding_002dmath-819 .

Comment 2 cck0011 2011-02-06 16:25:55 UTC

(In reply to comment #1)
> I think you need to use -frounding-math.  GCC assumes by default the rounding
> mode is round-to-nearest.  See
> http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Optimize-Options.html#index-frounding_002dmath-819
> .

Hi Andrew, 

  thanks for writing. I tried -frounding-math and the result is still the same. Adding/removing -mfpmath=sse doesn't change it either. Is there any additional information I can provide?

Thanks!

Comment 3 Richard Biener 2011-02-07 11:52:28 UTC

Can you provide non-preprocessed source?  I have difficulties in compiling with
newer releases.

Comment 4 cck0011 2011-02-08 01:37:58 UTC

Created attachment 23273 [details]
source file

Here's the source code. Rename to round.c.

Comment 5 cck0011 2011-02-08 01:46:18 UTC

(In reply to comment #4)
> Created attachment 23273 [details]
> source file
> 
> Here's the source code. Rename to round.c.

Hi Richard,

  here's the source code. Rename to round.c. 

  I think I must be doing something wrong here. Someone would have noticed that results from _mm_cvtps_pi16 weren't changing when _MM_SET_ROUNDING_MODE() was called. -) I'm puzzled by it working with -g, but not with -O3.

  Any additional information I can provide?

  Thanks!

Comment 6 Andrew Pinski 2011-02-08 01:58:40 UTC

The problem is the same as recorded as PR 34678.  We are optimizing all the _mm_cvtps_pi16 to one of them because we don't see the rounding mode has changed.  To get the correct values each time do the following:

void test_rounding(void)
{
 __m128 source = {-1.1, 0.0, 1.1, 1.5};
 __m64 dest;
 unsigned int initial_mode;

 initial_mode = _MM_GET_ROUNDING_MODE();
 print_rounding_mode("initial rounding mode", initial_mode);
 
 /* now set the rounding mode to each value to see the result  */
 
 asm("":"+X"(source)); // force source to be different but the same
 _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);

 dest = _mm_cvtps_pi16(source);
 _mm_empty();
 print_round_results("with _MM_ROUND_NEAREST ", source, dest);

 asm("":"+X"(source)); // force source to be different but the same
 _MM_SET_ROUNDING_MODE(_MM_ROUND_DOWN);

 dest = _mm_cvtps_pi16(source);
 _mm_empty();
 print_round_results("with _MM_ROUND_DOWN ", source, dest);
 
 asm("":"+X"(source)); // force source to be different but the same
 _MM_SET_ROUNDING_MODE(_MM_ROUND_UP);

 dest = _mm_cvtps_pi16(source);
 _mm_empty();
 print_round_results("with _MM_ROUND_UP ", source, dest);

 asm("":"+X"(source)); // force source to be different but the same
 _MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);

 dest = _mm_cvtps_pi16(source);
 _mm_empty();
 print_round_results("with _MM_ROUND_TOWARD_ZERO ", source, dest);

 /* restore initial rounding mode  */
  _MM_SET_ROUNDING_MODE(initial_mode);
  
}

*** This bug has been marked as a duplicate of bug 34678 ***

Comment 7 Richard Biener 2011-02-08 11:35:32 UTC

Well, this case is slightly different as we simply have const/pure builtins
that do not only depend on their arguments (but the FP state).  Thus we'd need
to trop the attributes from these functions for -frounding-math.  Not that
it would help a lot, given PR34678 ...

Comment 8 cck0011 2011-02-09 02:08:23 UTC

Hi folks,

  First, thanks for working on this.

  Second, I read the link and I _think_ I understand it. Let me paraphrase it back to you and you can tell me if I've got the point:

  There is an optimizer that extracts common expressions and evaluates them once instead of every time they occur. (What's the name of that so I can call it by the right name?) In my code it finds the expression:

  dest = _mm_cvtps_pi16(source);

  Several times. Since it doesn't see source changing, this expression only gets evaluated once. Now, the change to rounding mode that happens with _MM_SET_ROUNDING_MODE(...) isn't detected as something that would change the value of the _mm_cvtps_pi16(...) expression, so the optimization is not removed. Recognizing that change to rounding mode and reacting to it is what's at the heart of bug 34678, and that's why this is a duplicate.

  The work-arounds are:

1)insert 'asm("":"+X"(source));' before changing rounding mode to make the compiler re-evaluate expressions that use source.

2) do _MM_SET_ROUNDING_MODE(...) before any divisions or integer conversions that might get optimized out. The scope of the optimization is a function body and any inlined code. So do _MM_SET_ROUNDING_MODE early within that scope. 

  Is my understanding correct? 

  A few more questions:

  Will this bug exist on non-X86 processors?

  What does the 'asm("":"+X"(source));' expression do ?

  Will this syntax work for non-X86 processors?

  To be correct, should I compile with -frounding-math ?


Thanks!

Comment 9 cck0011 2011-02-12 18:20:03 UTC

Hi folks,

  I tried the asm("":"+X"(source));  as shown. I get an error: inconsistent operand constraints in an ‘asm’.

  The info pages make it look like this should work, but the Inline Assembly Howto doesn't mention the X constraint. If the compiler should agree with the info pages, I'm doing something wrong. What am I missing?

thanks