This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
In 2004, I've been working on optimzied software floating point for the SH4: http://gcc.gnu.org/ml/gcc-patches/2004-09/msg03062.html Rakesh Kumar posted a an SH assembly software floating point implementation that also supported the SH2 and was further along to completion, but with lower performance: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00624.html After a longer hiatus, I've now combined these these code bases. For SH1/SH2 support, I've used my own code for comparisons and single / double precision conversions, and Rakesh Kumar's and Aanchal Khanna's code as a basis for the arithmentic and integer/floating point conversions. After checking with the Copyright Clerk that a suitable assignment was on file, I've added Copyright headers, changed generated denormals to be consistent with what the comparison code expects, and added support for SH1 (which doesn't have delayed branches nor 32*32 bit multiply). For SH3 and SH4, I've used my own code, some from 2004, and some which I've written now. The code I wrote in 2004 is scheduled primarily for the SH4-200, with some considerations for earliuer processors where it didn't hurt the SH4. The newer code is scheduled primarily for the ST40-300, while some concessions have been made for the SH4-200 (e.g. using extra pc-relative constants to reduce EX group pressure). Speed is generally favoured over size, particularly for normalized number handling, but to some extent also for denormalized numbers. I.e. there are very few loops, no inter-module calls (which couldn't be guaranteed to be in bsr range), and I've added alignment instructions to help scheduling. The downside of this is that the code is somewhat larger than it would be otherwise, and it is extremely hard to get path or even code coverage for all the code when testing it. divsf3 uses the div1 instruction for the fraction computation; extracting quotient bytes when they are ready while feeding in new divident bytes obviates the need for an extra shift register, and the other pipepline is kept busy working on the exponent and flagging cases where the input is not finite or the output not normalized - these flags are then checked simultanously at the end using cmp/str. divdf3, on the other hand, uses a numerical algorithm. Each step relies on the previous steps to contain the error in a certain interval not only to keep the output error interval in check, but also so that the topmost bits of the calculated defect are known to be only sign extensions. The implementation that is actually used in divdf3.S can have two different run times for normalized numbers, depending on wether the result from the penultimate step is found to be sufficient to calcualte a correctly rounded result; this helps keep the average computing time down, to about 66 cycles for the ST40-300. If you are more interested in keeping worst-case times down, you might consider finishing divdf-rt.S; this should be able to operate on normalized numbers in something like 71 or 72 cycles. One known issue triggered by the sh.c / sh.md code to generate the calls to the library is PR rtl-optimization/28618; I have no proper solution for this yet, the patch I posted earlier caused other regressions. A possible workaround is to compile with -fno-schedule-insns . (N.B., running sched2 should be fine, the problem is the scheduling pass before register allocation which overcommits r0). I've also attached a small patch to fix a problem with the m2e / m3e / m4-nofpu multilib variants of the libgcc2 DImode <->> DFmode conversion functions.
Attachment:
softfp-20060902-1927.gz
Description: GNU Zip compressed data
Attachment:
sh-2e-doublefix
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |