This is the mail archive of the
mailing list for the GCC project.
Re: SPARC code inefficiency
- From: "David S. Miller" <davem at redhat dot com>
- To: dann at godzilla dot ICS dot UCI dot EDU
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 21 May 2002 23:32:34 -0700 (PDT)
- Subject: Re: SPARC code inefficiency
- References: <firstname.lastname@example.org>
From: Dan Nicolaescu <dann@godzilla.ICS.UCI.EDU>
Date: Tue, 21 May 2002 23:13:54 -0700
Look at all the uses for the %o2 register:
32 lines matching "o2" in buffer md5.s.
11: add %fp, -80, %o2
21: mov %o2, %i5
40: ld [%o2], %i0
53: add %o2, 4, %o2
all the above "add" instructions can be eliminated by using a reg + offset
Adding some peephole2s could solve this... Is there a better way?
The source code and assembly are attached.
There is nothing sparc specific about this lack of optimization.
Peepholes won't help at all because they cannot transform things
globally which is what needs to happen here.
I don't know if any of the generic optimization passes are already
supposed to handle this, but that is the kind of thing needed to
make the transformation you are looking for.
And BTW, with -mtune=ultrasparc the schedule is much better.
So it may not make any difference for the %o2 advancing problem
but it makes a HUGE difference when this is to be executed on
an UltraSPARC processor.