User account creation filtered due to spam.

Bug 71753 - Clamp function does not work with O3 optimization
Summary: Clamp function does not work with O3 optimization
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-04 09:51 UTC by Łukasz Spintzyk
Modified: 2016-07-04 14:53 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Code that reproduces the issue (11.30 KB, text/plain)
2016-07-04 09:51 UTC, Łukasz Spintzyk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Łukasz Spintzyk 2016-07-04 09:51:18 UTC
Created attachment 38830 [details]
Code that reproduces the issue

Hi,

The issue is that with O3 flag the fast_clamp function returns wrong value.
I have read your warning about compiler flags but i decided to submit that. Sorry for that it if this is invalid.

I have tested the code with fwrapv flag and indeed it fix the problem so probably the issue is related to signed arithmetic overflow optimizations.
I am submitting this as it can point real issue, on clang it is working and it is producing very similar output(see below).


I have tested this against various gcc versions and problem seems to be introduced between 4.9.3 and 5.1. Also reproducible on gcc 7.0


How to compile:
g++ -std=c++11 -O3 testClamp.cpp

When error happens running app should print following error:
> ./a.out
Fatal error, clamp of 260 gives 4 instead of 255



This issue is not reproducible on clang with the same optimizations enabled.
I was comparing its assembly output and functions differs only with one instruction (movzbl %dil,%eax). Maybe this issue can be fixed without disabling some of the optimizations with fwrapv flag.

gcc assebly:
0000000000000020 <_Z10fast_clampi>:
  20:	8d 87 00 ff ff 7f    	lea    0x7fffff00(%rdi),%eax
  26:	c1 f8 1f             	sar    $0x1f,%eax
  29:	09 f8                	or     %edi,%eax
  2b:	c1 ff 1f             	sar    $0x1f,%edi
  2e:	f7 d7                	not    %edi
  30:	21 f8                	and    %edi,%eax
  32:	c3                   	retq   


0000000000000020 <_Z10fast_clampi>:
  20:	8d 87 00 ff ff 7f    	lea    0x7fffff00(%rdi),%eax
  26:	c1 f8 1f             	sar    $0x1f,%eax
  29:	09 f8                	or     %edi,%eax
  2b:	c1 ff 1f             	sar    $0x1f,%edi
  2e:	f7 d7                	not    %edi
  30:	21 c7                	and    %eax,%edi
  32:	40 0f b6 c7          	movzbl %dil,%eax
  36:	c3                   	retq
Comment 1 Andrew Pinski 2016-07-04 10:04:01 UTC
Comment on attachment 38830 [details]
Code that reproduces the issue

value + 0x7FFFFF00

Will overflow most of the time.
Comment 2 Marc Glisse 2016-07-04 10:05:33 UTC
-fsanitize=undefined
a.c:11:21: runtime error: signed integer overflow: 260 + 2147483392 cannot be represented in type 'int'

You need to use an unsigned type for this type of computation.
Comment 3 Andrew Pinski 2016-07-04 10:05:45 UTC
Obvious integer overflow. Can use -fsantize=undefined to catch it with both gcc and clang/llvm. Closing as invalid.
Comment 4 Łukasz Spintzyk 2016-07-04 10:12:24 UTC
Yes, this code is utilizing overflow, but it is there for a reason to optimize the code and get rid of branches as they can slow down program execution.

You can refer to http://locklessinc.com/articles/sat_arithmetic/

Looking from this point of view is this really invalid?
Comment 5 Marc Glisse 2016-07-04 10:30:44 UTC
(In reply to Łukasz Spintzyk from comment #4)
> Looking from this point of view is this really invalid?

There is a perfectly valid way to write such code:

"You need to use an unsigned type for this type of computation."
Comment 6 Łukasz Spintzyk 2016-07-04 14:53:54 UTC
I confirm changing the code to use unsigned int fixed the problem.

Also there is no signed overflow errors. 

Thanks a lot.