This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: double argument casting
- From: laurent <laurent dot poche at gmail dot com>
- To: Ian Lance Taylor <iant at google dot com>
- Cc: Jorge PEREZ <jorge dot perez at invia dot fr>, gcc-help at gcc dot gnu dot org, jiri at gaisler dot com, konrad at gaisler dot com, ebotcazou at adacore dot com, lvcargnini at gmail dot com
- Date: Mon, 25 Oct 2010 18:09:02 +0200
- Subject: Re: double argument casting
- References: <4CC0491E.1040107@gmail.com> <mcrr5fhvki0.fsf@google.com> <4CC57F48.1020302@invia.fr>
Hello Ian,
If the casting is done in the callee function (and not in the caller
function), my program is reduced by 5%.
I think that all the users compiling with -Os option will be please to
implement the casting inside the callee function.
What is your feeling?
Laurent
On 25/10/2010 14:59, Jorge PEREZ wrote:
> Hello Laurent & Ian
>
> I'm actually very interested in the point you're making here.
>
> I checked the example in the first Laurent's email and I agree with the
> fact that for embedded systems, where code size is critical, the cast on
> the callers side (rather than the callee's) seems not very helpful.
>
> The compilation of the test using LLVM (not sure it's a good enough
> reference but that's all I got) gives the following results
>
> ------------------------> C code:
>
> short somme(char a, short b){
> int i;
> for (i=0; i<b; i++){
> a+=a;
> }
> return a+b;
> }
>
> short somme2(char a, short b){
> int i;
> for (i=0; i<b; i++){
> a+=b;
> }
> return a+2*b;
> }
>
> int main(){
> volatile short b=1;
> volatile char a=1, c=1;
> b=somme(a,b);
> b=somme(c,b);
> b=somme(a,c);
> b=somme2(a,b);
> b=somme2(c,b);
> b=somme2(a,c);
> b=somme2(a,c*2);
> b=somme2(a,c*3);
> b=somme2(a,c*4);
> return 0;
> }
>
>
> ------------------------> GCC 3.4.4 partial dissassembly:
>
> 40001168 <somme>:
> 40001168: 83 2a 60 10 sll %o1, 0x10, %g1
> 4000116c: 83 38 60 10 sra %g1, 0x10, %g1
> 40001170: 80 a0 60 00 cmp %g1, 0
> 40001174: 24 80 00 06 ble,a 4000118c <somme+0x24>
> 40001178: 91 2a 20 18 sll %o0, 0x18, %o0
> 4000117c: 82 80 7f ff addcc %g1, -1, %g1
> 40001180: 12 bf ff ff bne 4000117c <somme+0x14>
> 40001184: 90 02 00 08 add %o0, %o0, %o0
> 40001188: 91 2a 20 18 sll %o0, 0x18, %o0
> 4000118c: 91 3a 20 18 sra %o0, 0x18, %o0
> 40001190: 90 02 00 09 add %o0, %o1, %o0
> 40001194: 91 2a 20 10 sll %o0, 0x10, %o0
> 40001198: 81 c3 e0 08 retl
> 4000119c: 91 3a 20 10 sra %o0, 0x10, %o0
>
> 400011e4 <main>:
> 400011e4: 9d e3 bf 90 save %sp, -112, %sp
> 400011e8: 82 10 20 01 mov 1, %g1
> 400011ec: c2 37 bf f6 sth %g1, [ %fp + -10 ]
> 400011f0: c2 2f bf f5 stb %g1, [ %fp + -11 ]
> 400011f4: c2 2f bf f4 stb %g1, [ %fp + -12 ]
> 400011f8: d0 0f bf f5 ldub [ %fp + -11 ], %o0
> ******************** 1
> 400011fc: d2 17 bf f6 lduh [ %fp + -10 ], %o1
> ******************** 2
> 40001200: 91 2a 20 18 sll %o0, 0x18, %o0
> ******************** 3
> 40001204: 93 2a 60 10 sll %o1, 0x10, %o1
> ******************** 4
> 40001208: 93 3a 60 10 sra %o1, 0x10, %o1
> ******************** 5
> 4000120c: 7f ff ff d7 call 40001168 <somme>
> ******************** 6
> 40001210: 91 3a 20 18 sra %o0, 0x18, %o0
> ******************** 7
> 40001214: d0 37 bf f6 sth %o0, [ %fp + -10 ]
> ******************** 8
> 40001218: d0 0f bf f4 ldub [ %fp + -12 ], %o0
> 4000121c: d2 17 bf f6 lduh [ %fp + -10 ], %o1
> 40001220: 91 2a 20 18 sll %o0, 0x18, %o0
> 40001224: 93 2a 60 10 sll %o1, 0x10, %o1
> 40001228: 93 3a 60 10 sra %o1, 0x10, %o1
> 4000122c: 7f ff ff cf call 40001168 <somme>
> 40001230: 91 3a 20 18 sra %o0, 0x18, %o0
> 40001234: d0 37 bf f6 sth %o0, [ %fp + -10 ]
> 40001238: d0 0f bf f5 ldub [ %fp + -11 ], %o0
> 4000123c: d2 0f bf f4 ldub [ %fp + -12 ], %o1
> ...
>
> ------------------------> LLVM partial dissassembly:
>
> Disassembly of section .text:
>
> 00000000 : --> This would be
> the "somme" function
> 0: 9d e3 bf a0 save %sp, -96, %sp
> 4: a0 a6 60 01 subcc %i1, 1, %l0
> 8: 06 80 00 10 bl 48
> c: 01 00 00 00 nop
> 10: 10 80 00 02 b 18
> 14: 01 00 00 00 nop
> 18: a0 10 00 19 mov %i1, %l0
> 1c: a2 10 00 18 mov %i0, %l1
> 20: a2 0c 60 ff and %l1, 0xff, %l1
> 24: a2 04 40 18 add %l1, %i0, %l1
> 28: a5 2c 60 18 sll %l1, 0x18, %l2
> 2c: b1 3c a0 18 sra %l2, 0x18, %i0
> 30: a0 04 3f ff add %l0, -1, %l0
> 34: a4 a4 20 00 subcc %l0, 0, %l2
> 38: 12 bf ff fa bne 20
> 3c: 01 00 00 00 nop
> 40: 10 80 00 02 b 48
> 44: 01 00 00 00 nop
> 48: a0 06 00 19 add %i0, %i1, %l0
> 4c: a1 2c 20 10 sll %l0, 0x10, %l0
> 50: b1 3c 20 10 sra %l0, 0x10, %i0
> 54: 81 e8 00 00 restore
> 58: 81 c3 e0 08 retl
> 5c: 01 00 00 00 nop
>
> 000000cc :
> cc: 9d e3 bf 98 save %sp, -104, %sp
> d0: a0 10 20 01 mov 1, %l0
> d4: e0 37 bf fe sth %l0, [ %fp + -2 ]
> d8: e0 2f bf fd stb %l0, [ %fp + -3 ]
> dc: e0 2f bf fc stb %l0, [ %fp + -4 ]
> e0: d0 4f bf fd ldsb [ %fp + -3 ], %o0 ******************** 1
> e4: d2 57 bf fe ldsh [ %fp + -2 ], %o1 ******************** 2
> e8: 40 00 00 00 call e8 ******************** 3
> ec: 01 00 00 00 nop ******************** 4
> f0: d0 37 bf fe sth %o0, [ %fp + -2 ]
> f4: d0 4f bf fc ldsb [ %fp + -4 ], %o0
> f8: d2 57 bf fe ldsh [ %fp + -2 ], %o1
> fc: 40 00 00 00 call fc
> 100: 01 00 00 00 nop
> 104: d0 37 bf fe sth %o0, [ %fp + -2 ]
> 108: d0 4f bf fd ldsb [ %fp + -3 ], %o0
> 10c: d2 4f bf fc ldsb [ %fp + -4 ], %o1
> 110: 40 00 00 00 call 110
> 114: 01 00 00 00 nop
> 118: d0 37 bf fe sth %o0, [ %fp + -2 ]
> ...
>
>
> A couple observations from this:
>
> - in the case of LLVM only 4 instructions (the NOP is a waste since the
> delay slot is not correctly implemented) are required per call (since
> the cast is on the callee side).
> - in the case of GCC there are 8 instructions required due to the
> duplication of the cast in the caller and the callee.
>
> Based on this, it seems quite interesting to KEEP the cast only in the
> CALLEE's side rather than the caller's. Since there are 9 calls in the
> main, this requires 9*8=47 instructions with GCC, whereas it only
> requires 9*4=36 instructions using LLVM, this is a huge difference when
> code size matters. I guess we can assume that in the callee's side the
> code size is similar in both cases since a casting is always performed.
>
> In conclusion, the SPARC code size could be reduced by approx. (4
> instructions) x (number of calls) if the cast is done exclusively on the
> CALLEE's side. So, is it really necessary to keep it on the caller's
> side or can we try to do it only on the callee's side?
>
>
> PS: since I work with LEON, I permitted myself to put in CC the guys
> concerned by this thread http://gcc.gnu.org/ml/gcc/2010-09/msg00014.html
> I hope it doesn't bother anyone
>
> Have a good day,
>
>
> George
>
>
>
> Ian Lance Taylor wrote:
>
>> laurent <laurent.poche@gmail.com> writes:
>>
>>
>>
>>> When a caller function calls a callee function with short or char
>>> arguments, the arguments are casted twice: inside the caller function
>>> and inside the callee function, see the example. It is a waste of
>>> performance in code density and speed!
>>>
>>> I don't understand why there is a double casting. Is there any
>>> optimization I could activate in GCC to remove it?
>>>
>>>
>> It's basically a bug. gcc should only do it on the caller side. Doing
>> it on the callee side is a holdover from the good pre-C90 days, when
>> code like
>>
>> int f(i)
>> char i;
>> {
>> ...
>> }
>>
>> had to be treated as equivalent to
>>
>> int f(int passed_i)
>> {
>> char i = (char) passed_i;
>> ...
>> }
>>
>> These days I think we can just drop the cast on the callee side. As I
>> recall that was done for x86 a while back, somebody just needs to do it
>> for SPARC.
>>
>> Please file a bug report according to the instructions at
>> http://gcc.gnu.org/bugs/ (unless there is already a bug report for
>> this). Thanks.
>>
>> Ian
>>
>>
>>
>