Bug 15837 - Optimization (of templatized functions?)
Summary: Optimization (of templatized functions?)
Status: RESOLVED DUPLICATE of bug 21920
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.3.2
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-05 20:11 UTC by Adrian
Modified: 2005-07-23 22:49 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
The preprocessed file for compilation bzipped (89.77 KB, application/x-bzip)
2004-06-05 20:13 UTC, Adrian
Details
smaller testcase (15 KB) (2.97 KB, text/plain)
2004-06-05 22:27 UTC, Serge Belyshev
Details
Further trimmed, and illustrated hack to fix optimization error (2.55 KB, text/plain)
2004-06-06 06:12 UTC, Adrian
Details
The code written correctly (a bit ugly though...) (2.54 KB, text/plain)
2004-06-06 09:16 UTC, Adrian
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian 2004-06-05 20:11:52 UTC
Seems to be an optimization bug for some templatized functions that I have
(specifically vbox_blur in the precompiled output pen.ii, which is bzipped as
requested).  It's pretty severe.  

In this case, it crashes in release mode (I think because it is forgetting to
decrement the iterator outpen as specified after the inner loop in vbox_blur
(should be right above all of the test functions at the bottom of the file).  

Another bizarre thing, is if function calls are inserted into the loop (say
print functions for debugging, though it has similar effects for some dummy
functions I've tried), it can change whether or not it crashes and such.

In the end it looks like an optimization bug, where it optimizes its way around
some pretty important operations (pointer operations mostly it seems...).

The program outputs patterns representing the result of the operations (the box
blur is the one that's freaking out). The pattern printed out below the CORRECT
OUTPUT line is what the AFTER BOX BLUR output should look like.  You can also
see the correct output for all of the substeps by compiling it in full debug
mode with no optimizations.  If you need some examples for the function call
thing, just ask, I have plenty (I can send the normal source file with all the
commented out print functions for debugging if need be).

Here's GCC's compiler output:

source='pen.cpp' object='pen.o' libtool=no \
depfile='.deps/pen.Po' tmpdepfile='.deps/pen.TPo' \
depmode=gcc3 /bin/sh ../config/depcomp \
g++ -DHAVE_CONFIG_H -I.. -I../ETL -I.. -I..    -v -save-temps -g -O2 -DNDEBUG
-Wall -c -o pen.o `test -f 'pen.cpp' || echo './'`pen.cpp
Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: /var/tmp/portage/gcc-3.3.2-r5/work/gcc-3.3.2/configure
--prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/3.3
--includedir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3/info --enable-shared
--host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --with-system-zlib
--enable-languages=c,c++,f77,objc --enable-threads=posix --enable-long-long
--disable-checking --enable-cstdio=stdio --enable-clocale=generic
--enable-__cxa_atexit --enable-version-specific-runtime-libs
--with-gxx-include-dir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3
--with-local-prefix=/usr/local --enable-shared --enable-nls
--without-included-gettext --disable-multilib
Thread model: posix
gcc version 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/cc1plus -E -D__GNUG__=3 -quiet -v -I..
-I../ETL -I.. -I.. -MD pen.d -MF .deps/pen.TPo -MP -MT pen.o -MQ pen.o
-D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=2 -D_GNU_SOURCE
-DHAVE_CONFIG_H -DNDEBUG pen.cpp -Wall -O2 pen.ii
ignoring nonexistent directory
"/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../../i686-pc-linux-gnu/include"
ignoring duplicate directory ".."
ignoring duplicate directory ".."
#include "..." search starts here:
#include <...> search starts here:
 ..
 ../ETL
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/i686-pc-linux-gnu
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/backward
 /usr/local/include
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include
 /usr/include
End of search list.
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/cc1plus -fpreprocessed pen.ii -quiet
-dumpbase pen.cpp -auxbase-strip pen.o -g -O2 -Wall -version -o pen.s
GNU C++ version 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)
(i686-pc-linux-gnu)
	compiled by GNU C version 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7).
GGC heuristics: --param ggc-min-expand=51 --param ggc-min-heapsize=40960
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../../i686-pc-linux-gnu/bin/as
-V -Qy -o pen.o pen.s
GNU assembler version 2.14.90.0.8 (i686-pc-linux-gnu) using BFD version
2.14.90.0.8 20040114
g++  -v -save-temps -g -O2 -DNDEBUG -Wall   -o pen  pen.o  -lpthread 
Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: /var/tmp/portage/gcc-3.3.2-r5/work/gcc-3.3.2/configure
--prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/3.3
--includedir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3/info --enable-shared
--host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --with-system-zlib
--enable-languages=c,c++,f77,objc --enable-threads=posix --enable-long-long
--disable-checking --enable-cstdio=stdio --enable-clocale=generic
--enable-__cxa_atexit --enable-version-specific-runtime-libs
--with-gxx-include-dir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3
--with-local-prefix=/usr/local --enable-shared --enable-nls
--without-included-gettext --disable-multilib
Thread model: posix
gcc version 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)
 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/collect2 --eh-frame-hdr -m elf_i386
-dynamic-linker /lib/ld-linux.so.2 -o pen
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../crt1.o
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../crti.o
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/crtbegin.o
-L/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2
-L/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../../i686-pc-linux-gnu/lib
-L/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../.. pen.o -lpthread -lstdc++ -lm
-lgcc_s -lgcc -lc -lgcc_s -lgcc
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/crtend.o
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/../../../crtn.o
Comment 1 Adrian 2004-06-05 20:13:24 UTC
Created attachment 6477 [details]
The preprocessed file for compilation bzipped
Comment 2 Adrian 2004-06-05 20:18:45 UTC
My bad for putting the gcc output in the post.  I also forgot to post my system
specs (Athlon Tbird 1.2 Ghz, 512 M RAM, A7S333 Mobo running Gentoo linux, kernel
2.6.0).  First bug ever, so please excuse the oops :).
Comment 3 Adrian 2004-06-05 20:41:10 UTC
Ok, I've narrowed it down farther.  In both hbox_blur and vbox_blur if you call
a function inside the inner loop and pass it x for hbox and y for vbox (printf
for example), it the faulty optimization is circumvented.

ex. line 32951 of pen.ii
insert: printf("%d",y);

That loop will then work fine. Hope this helps.

Comment 4 Andrew Pinski 2004-06-05 20:50:37 UTC
Can you try 3.4.0 and fix your code at the same time because there is some invalid C++ in there which 
will not compile for 3.4.0?
Comment 5 Adrian 2004-06-05 21:49:25 UTC
Sure I'll try, it isn't tested at all in gentoo, so it could have some issues,
but it never hurts to try :).
Comment 6 Serge Belyshev 2004-06-05 22:27:44 UTC
Created attachment 6478 [details]
smaller testcase (15 KB)
Comment 7 Andrew Pinski 2004-06-05 22:42:25 UTC
You are violating aliasing rules in the deferencing of the value returned here:

        operator const generic_pen_row_iterator<const value_type>&()const
        {
                return *reinterpret_cast<generic_pen_row_iterator<const value_type>*>(this);
        }

So this is invalid, please fix your code.
Comment 8 Adrian 2004-06-05 22:56:52 UTC
Sorry about that.  GCC 3.4 is taking forever to rebuild on my machine.

You should be able to just get rid of that function.  I'm just upkeeping this
code, but I'm pretty sure it's not used.
Comment 9 Andrew Pinski 2004-06-06 01:07:32 UTC
You have other code which violates C++ aliasing rules, either use -fno-strict-aliasing as a work around 
or fix your code to avoid the aliasing rules problems.  So this is not a bug in gcc still.
Comment 10 Adrian 2004-06-06 04:39:13 UTC
I was not speaking about compiler errors, the bugs I was referring to are
incorrect optimizations in gcc 3.3.x.  It compiles fine in gcc 3.3.2 and 3.3.3.  

I will gladly fix any non-standard C++, however, at the moment, I have no way of
detecting which individual errors you are referring to (I am working on setting
up a stable gcc 3.4 environment as we speak).

As is, I don't think gcc 3.4 is particularly relevant, since, from what I've
heard, the optimization system has been changed substantially.  If I am
incorrect please let me know.

I will try and fix the errors (you will have to wait until I can detect them). 
However, 3.3 is the target.  If I should just revert to 3.2 and wait for 3.4 to
become stable on my distro, then I will do so.  

Thank you for your patience as I am unfamiliar with the protocols of gcc bug
registration.
Comment 11 Andrew Pinski 2004-06-06 04:49:28 UTC
No what I am saying is that your code does things which are "undefined" by the C++ standard so the 
compiler is able to do anything (including erasing your hard drive, no gcc will not produce code to do 
that though).  Casting to a different type and then accessing the data inside is declared as undefined 
(there are a few exceptions to that rule, like you access the data via a char pointer and there is no 
aliasing problems, you can cast to a type which adds const or violate and there is no aliasing problems 
but casting between different pointer types and then accessing the data for both of the type is 
undefined).
Comment 12 Adrian 2004-06-06 06:12:06 UTC
Created attachment 6482 [details]
Further trimmed, and illustrated hack to fix optimization error

This is a further trimmed version of the file.	I've successfully compiled it
with gcc 3.4 using MinGW.  Otherwise, I'm not sure to which aliasing errors you
are referring, and why exactly they invalidate this as a bug with respect to
gcc.
Comment 13 Adrian 2004-06-06 06:22:17 UTC
Ok, I understand that.  However, if it is undefined why does gcc not give an
error? (sorry I missed your last post as I was adding the attachment).

Also, the pen class was originally designed by someone else in such a way to
allow it to work with non standard compiler packing structures (although that
was not accomplished as you'd also need a byte stride for the x dimension to do
that effectively).  I.E. you could refer to a member of a struct which was
defined to align on a 2 byte boundary and still be able to jump from one element
to the other.  Perhaps this is an unnecessary feature, and I suppose it could be
modified to work on a strict element basis rather than byte alignment.  However,
Is any way to accomplish this besides the method used?

The only thing I can think of to satisfy this constraint is to be more explicit
(i.e. rather than ((char *&)p)+= stride; do p = (value_type*)((char*)p +
stride); ).  Would that not work as well?  I'll try that and see....

Thanks for the help though.  Please excuse me if I seem a bit irritable, this is
the weirdest thing I have ever run into :).
Comment 14 Adrian 2004-06-06 06:38:33 UTC
Hmmmmm, ok, so I changed a bunch of the functions to use p = (pointer)((char*)p
+ stride); and it seems to have fixed the mis-optimization.  (it could just be
because the current function has to recurse another level, but I'm willing to
bet that it just doesn't get confused).

So my question now is why does gcc even allow the cast to (char*&)?  If it will
get so confused shouldn't it tell me?

It's worth noting that the & part of the cast was a quick fix to get around
compiler errors that I was getting about aliasing... it seemed to work at the
time :P.

So I'll certainly keep that in mind when I'm designing future stuff (I didn't
write this really in the first place... so you know :P).
Comment 15 Andrew Pinski 2004-06-06 06:39:42 UTC
Because undefined means just that, it can be defined any which way the compiler wants it to be defined 
and differently at different times too.
There can be diagnostic for undefined code but there are other bug reports about adding warnings for 
code like this.

I will note that one way to work around this is to use -fno-strict-aliasing.  Also the reason why it 
compiles fine with 3.3.3 and not 3.4.0 is because 3.4.0 is a stricter in following the C++ standard.
Comment 16 Gabriel Dos Reis 2004-06-06 08:08:05 UTC
Subject: Re:  Optimization (of templatized functions?)

"adruab at voria dot com" <gcc-bugzilla@gcc.gnu.org> writes:

| So my question now is why does gcc even allow the cast to (char*&)?
| If it will get so confused shouldn't it tell me?

GCC did not get confused, and it did not do any mis-compilation.

-- Gaby
Comment 17 Adrian 2004-06-06 09:16:42 UTC
Created attachment 6483 [details]
The code written correctly (a bit ugly though...)

I was referring to gcc 3.4 (under mingw) when I was saying it wasn't producing
any errors.  It worked fine when I tried it.  Yes, it probably was non-standard
C++ code, but that doesn't change the fact it didn't give me an error (I would
certainly prefer it to if it's going to cause a crash).  Does compiling the
last attachment give anyone else an error (bug15837-1.cc)?

And pardon me for saying so, but if gcc does something different when using
optimizations, I'd certainly say that's a mistake when optimizing.  As it
happens, from what you guys have been saying it probably should have given me
an error saying it didn't know how to deal with it.  The fact remains, that it
didn't.  So where the problem lies I don't know, but there IS a problem.  I'm
just trying to help out here, so please don't get all angry at me because I
mis-interpreted what the exact problem was.
Comment 18 Wolfgang Bangerth 2004-06-06 18:35:33 UTC
In some cases, gcc has to make assumptions (base on what the standard 
allows it to) for aggressive optimizations. Some of these assumptions 
can be checked, in which case the compiler will give you an error or 
warning message. In some other cases, checking whether your code allows 
to make such assumptions is just out of the picture, for example if 
the compiler would have to check where exactly each pointer was initialized 
and how you casted it around -- aliasing refers to the question whether you 
access each object only through pointers to this object's type, or if, for 
example, you access a double via pointers to integers. The latter is not 
allowed by the standard, but the compiler can't prove in some cases that 
you don't.  
 
If your code violates aliasing assumptions, there is a -Wstrict-aliasing 
switch (or similar) that can warn about some, though not all, invalid 
constructs in codes. 
 
W. 
Comment 19 Adrian 2004-06-06 19:25:42 UTC
Heh, thanks, that makes sense.  I've never actually run into any issue like that
before, where problems are undiagnosable by the compiler :).

So it working correctly using a p = (type*)((char*)p + stride) method is purely
coincidental?  Does it violate the same aliasing rules?  I assume at this point
that you are talking about aliasing as in relation to memory boundaries (in
which case the other would still break the code).  Can you point me to spot in
the spec that discusses aliasing rules?

Thanks for the help,
I hope that this bug can help diagnose problems like this in the future.

Adruab
Comment 20 Andrew Pinski 2005-06-05 08:46:51 UTC
Reopening to ...
Comment 21 Andrew Pinski 2005-06-05 08:47:12 UTC
Mark as a dup of bug 21920.

*** This bug has been marked as a duplicate of 21920 ***