This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: Utilizing GCC Prefetch Analysis -- Instructions not being generated
- From: Malek Musleh <malek dot musleh at gmail dot com>
- To: gcc-help at gcc dot gnu dot org
- Date: Thu, 21 Aug 2014 02:17:08 -0400
- Subject: Re: Utilizing GCC Prefetch Analysis -- Instructions not being generated
- Authentication-results: sourceware.org; auth=none
- References: <CAPOxfJUF=F5-HysAYvPjS8SWits1SJ9oRbWm+DR-JH-ufEtrjQ at mail dot gmail dot com>
To clarify the second part,
the objdump I am showing is the expansion of the built_in prefetch
intrinsics. I used objdump --source -d ./my_program. Hence, I was
expecting built_in prefetch to use ldl, lds,or ldq rather than lda.
Thanks,
Malek
On Thu, Aug 21, 2014 at 12:33 AM, Malek Musleh <malek.musleh@gmail.com> wrote:
> Hi,
>
> I am trying to determine the performance impact of gcc's internal
> software prefetching analysis. I have compiled my benchmarks with the
> following flags:
>
> CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays
>
> However, after compiling, and examining the objdump of the binary, I
> do not see any inserted prefetch instructions. Specifically, I am
> using an ALPHA cross compiler (gcc version 4.2, so I know it has
> prefetching support), and the prefetch instructions that should be
> generated are: lds, ldl, or ldq
>
> http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm
>
>
> My example program code snippet is:
>
> int main (int argc, char *argv[])
> {
>
> for (i = 0; i < 10000; i++){
> for (j = 0; j < 10000; j++){
> a[i][j] = b[j][0] + b[j+1][0];
> }
> }
> }
>
> The loops are large, and regular enough so the analysis pass should
> determine that prefetching is possible. Would anyone know why the
> instructions are not being generated, or if the objdump is not
> capturing those prefetch instructions?
>
> As a separate note, I did try to use the gcc prefetch intrinsics, and
> examined the objdump:
>
> __builtin_prefetch (&a[i+j], 1, 1);
> 12000060c: 20 00 4f a0 .long 0xa04f0020
> 120000610: 1c 00 2f a0 .long 0xa02f001c
> 120000614: 01 00 41 40 .long 0x40410001
> 120000618: 01 00 e1 43 .long 0x43e10001
> 12000061c: 42 16 20 40 .long 0x40201642
> 120000620: 30 00 2f 20 lda t0,48(fp)
> 120000624: 01 04 22 40 .long 0x40220401
> 120000628: 00 00 e1 8b .long 0x8be10000
> __builtin_prefetch (&b[i+j], 0, 1);
> 12000062c: 20 00 4f a0 .long 0xa04f0020
> 120000630: 1c 00 2f a0 .long 0xa02f001c
> 120000634: 01 00 41 40 .long 0x40410001
> 120000638: 01 00 e1 43 .long 0x43e10001
> 12000063c: 42 16 20 40 .long 0x40201642
> 120000640: 70 1f 2f 20 lda t0,8048(fp)
> 120000644: 01 04 22 40 .long 0x40220401
> 120000648: 00 00 e1 a3 .long 0xa3e10000
>
> In this case, it seems that the compiler is generating a different set
> of instructions for the prefetch instrinsic, and not using what the
> alpha manual says.
>
> Thanks,
>
> Malek