This is the mail archive of the
mailing list for the GCC project.
Re: Split Stack performance, signals
- From: Anders Oleson <anders at openpuma dot org>
- To: Ian Lance Taylor <iant at google dot com>, keithr at alum dot mit dot edu, gcc at gcc dot gnu dot org
- Date: Wed, 16 Sep 2015 01:18:14 -0700
- Subject: Re: Split Stack performance, signals
- Authentication-results: sourceware.org; auth=none
- References: <CAKTo2EkTGrb9_ucRVXOVB3FBiVBcqW6qOkHKuxUhZfdFPrjQBA at mail dot gmail dot com> <CAKOQZ8yKXP-75PkEgxqrDMjOqXxUVfe-MRG3o3byUR4AAPSt2g at mail dot gmail dot com>
>> prolog overhead, no call to __morestack : < 1 clock
>> stock call to __morestack (hot): > 4000 clocks
>> without signal blocking: < 60 clocks
>> potential best case: < 6 clocks
> This sounds great.
The data structure I was experimenting with ended up to be not very
different than struct stack_segment. So I am adapting my standalone
test to morestack.S in libgcc. It may not achieve quite 6 clock cycles
within the existing framework, but it should be pretty close. But it
will be useful enough as a larger scale test to be worth a little
effort attempting it.
I also played with using the modulo page-size lower boundary (option
#5) instead. It would have solved one problem with atomic updates but
not all, and but would require very finicky book-keeping. FWIW, it
caused the prolog to slow down just slightly but was actually around
50% shorter. So using fs:0x70 still appears the best performance and a
good balance overall.
How difficult is it to modify the prologs that get generated? I think
I found the code that does that in i386.c and i386.md, but it is
pretty cryptic to me. Any pointers? I know exactly what I want the
assembler to look like. If so I can reduce the overhead from 36 bytes
to 27 for best performance and 21 for best size.
I have not yet played with Go. Keith mentioned having seen issues with
performance variations - is there a representative Go project that I
could build as a good full scale test/benchmark with gccgo? I tried
compiling GCC itself with the _stock_ -fsplit-stack by adding it to
BOOT_CFLAGS. It did not go well. One of the code generator programs
bombed, but it didn't expect it to work easily. Maybe a bit less
*full* scale of a test than that ;)