There are cases where inline doesn't cause inlining, but __attribute__((always_inline)) causes inlining and improves the performance of the code. Having to resort to __attribute__((always_inline)) is rather ugly, especially in code that also has to compile on other compilers. This is filed as suggested in bug 11680 comment 4. A testcase is attachment 4487 [details] (from bug 11680). Steps to reproduce: 1. Compile attachment 4487 [details] using -O2. 2. time "<executable> original" and "<executable> slow". 3. Modify the attachment to comment out the __attribute__((always_inline)) (and, if you want, add |inline|, although C++ says it's implied for methods defined inside of class defitions, and it doesn't make a difference) in the class ConvertUTF16toUTF8_slow 4. recompile Expected results: 2. The two take the same amount of time. 4. There is no change in the size of the binary. Actual results: 2. "slow" is faster than "original". 4. the binary is bigger without the __attribute__((always_inline))
(I'm seeing this both with gcc 3.3.1 and with the trunk as of 20030908.)
Please post results for -O2 -fno-unit-at-a-time and -O2 -funit-at-a-time. I am not sure what the current default is for that switch, but this is the sort of thing that unit-at-a-time mode was intended to address.
I thought I saw this earlier today using the trunk, but I must have been using 3.3. I don't see the bug using the trunk anymore, which is consistent with bug 11680 comment 2.