This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][libgcc-math] Vectorized intrinsics for x86_64
- From: Richard Guenther <rguenther at suse dot de>
- To: Richard Henderson <rth at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Tue, 4 Apr 2006 17:40:07 +0200 (CEST)
- Subject: Re: [PATCH][libgcc-math] Vectorized intrinsics for x86_64
- References: <Pine.LNX.4.64.0603281022000.3982@t148.fhfr.qr> <20060331155757.GB15017@redhat.com>
On Fri, 31 Mar 2006, Richard Henderson wrote:
> On Tue, Mar 28, 2006 at 10:23:47AM +0200, Richard Guenther wrote:
> > The intrinsics implementation was contributed by AMD to be licensed
> > as GPL + libgcc execption.
>
> It's a shame they were written in hand-coded assembly; otherwise
> we could use them for 32-bit as well. I don't have any trouble
> adding these routines, but I think the data sections need some work.
>
> First, none of the data put in .data is writable. At bare minimum
> this should be going into .rodata. But I also see that there is
> quite a bit of overlap between the various routines, so it would be
> Much Better if we could make use of the constant pooling featues of
> the linker. So the tables should go into .rodata, but the individual
> double-precision values should go into
>
> .section .rodata.cst8, "M", @progbits, 8
> .align 8
Here's an updated patch with your suggestions applied (but with .align 16,
to have it cacheline aligned), like so
.section .rodata.cst16, "M", @progbits, 16
.align 16
Bootstrapped on x86_64-unknown-linux-gnu.
Ok for mainline? (I'll wait until someone looks at / approves Zdeneks
patch to utilize these functions)
Thanks,
Richard.
(patch attached due to size)
2006-04-04 Richard Guenther <rguenther@suse.de>
* configure.ac: Handle x86_64 subdir.
* configure: Regenerate.
* Makefile.in: Regenerate.
* x86_64/Makefile.am: New file.
* x86_64/Makefile.in: Regenerate.
* x86_64/libm_util_amd.h: New file.
* x86_64/remainder_piby2d2f.c: Likewise.
* x86_64/remainder_piby2.c: Likewise.
* x86_64/vrd2log.s: Likewise.
* x86_64/vrd2log10.s: Likewise.
* x86_64/vrd4log.s: Likewise.
* x86_64/vrs4powxf.s: Likewise.
* x86_64/vrd4log10.s: Likewise.
* x86_64/vrd2cos.s: Likewise.
* x86_64/vrs4sincosf.s: Likewise.
* x86_64/vrd4cos.s: Likewise.
* x86_64/vrd2sin.s: Likewise.
* x86_64/vrd4sin.s: Likewise.
* x86_64/vrs4logf.s: Likewise.
* x86_64/vrs8logf.s: Likewise.
* x86_64/vrs4sinf.s: Likewise.
* x86_64/vrs4expf.s: Likewise.
* x86_64/vrs8expf.s: Likewise.
* x86_64/vrs4log2f.s: Likewise.
* x86_64/vrd2exp.s: Likewise.
* x86_64/vrs4powf.s: Likewise.
* x86_64/vrs8log2f.s: Likewise.
* x86_64/vrd2sincos.s: Likewise.
* x86_64/vrd4exp.s: Likewise.
* x86_64/vrd2log2.s: Likewise.
* x86_64/vrd4log2.s: Likewise.
* x86_64/vrs4log10f.s: Likewise.
* x86_64/vrs4cosf.s: Likewise.
* x86_64/vrs8log10f.s: Likewise.
* x86_64/mv.map: New version map.