August 25, 1998
We are pleased to announce that David Miller has donated a rewrite of the SPARC backend for GCC. This rewrite should improve performance as well as improve long term maintainability of the compiler. Details follow.
1) Improved instruction and delayed branch scheduling for operations
on "long long" and double float quantities on 32-bit SPARC targets.
Simple example:
extern void ll_test1 (long long);
void example1 (void)
{
ll_test1 (0xdeadbeef12345678);
}
Here is output from the old compiler:
example1:
...
sethi %hi(-559038737),%o0
or %o0,%lo(-559038737),%o0
sethi %hi(305419896),%o1
or %o1,%lo(305419896),%o1
call ll_test1,0
nop ! Delay slot not filled
And here is what the new compiler generates:
example1:
...
sethi %hi(-559039488), %o0
sethi %hi(305419264), %o1
or %o0, 751, %o0
call ll_test1, 0
or %o1, 632, %o1 ! Delay slot is filled
2) Address and constant formation vastly improved on 64-bit SPARC
targets. These are the two main areas where the previous 64-bit
SPARC support was totally lacking. Here are some amusing examples
of the improved constant formation support on sparc64:
unsigned long t1(void) { return 0xffc0000000000000; }
old_t1:
sethi %hi(-4194304),%o0
sllx %o0,32,%o0
retl
add %o0,%lo(0),%o0 ! Spurious instruction
new_t1:
mov -1, %o0
retl
sllx %o0, 54, %o0
unsigned long t2(void) { return 0x000001ffffffffff; }
/* Hard coded temporary reg (%g1) which prevents CSE,
plus a double instruction sequence which cannot be
scheduled into a delay slot. */
old_t2:
mov 511,%o0
sllx %o0,32,%o0
sethi %hi(-1),%g1; or %o0,%g1,%o0
retl
add %o0,%lo(-1),%o0
/* No temporary at all, all instructions schedulable
and full CSE can happen for all intermediate values. */
new_t2:
mov -1, %o0
retl
srlx %o0, 23, %o0
unsigned long t3(void) { return 0x2300000000000000; }
old_t3:
sethi %hi(587202560),%o0
sllx %o0,32,%o0
retl
add %o0,%lo(0),%o0 ! Again, spurious instruction
new_t3:
mov 35, %o0
retl
sllx %o0, 56, %o0
unsigned long t5(void) { return 0xffffffffdeadbeef; }
/* Again, hard coded temporary, unschedulable instruction
sequence, 4 instructions total. */
old_t5:
mov -1,%o0
sllx %o0,32,%o0
sethi %hi(-559038737),%g1; or %o0,%g1,%o0
retl
add %o0,%lo(-559038737),%o0
/* No temporaries, fully CSE'able, all insns schedulable,
2 instructions total. */
new_t5:
sethi %hi(559038464), %o0
retl
xor %o0, -273, %o0
unsigned long t7(void) { return 0xffcfffffffffffff; }
/* Hard coded temporary, 5 instructions. */
old_t7:
sethi %hi(-3145729),%o0
or %o0,%lo(-3145729),%o0
sllx %o0,32,%o0
sethi %hi(-1),%g1; or %o0,%g1,%o0
retl
add %o0,%lo(-1),%o0
/* No hard coded temporaries, 3 instructions. */
new_t7:
mov 3, %o0
sllx %o0, 52, %o0
retl
xnor %g0, %o0, %o0
3) Full support for nearly all features of the new 64-bit SPARC ELF V9
ABI. This includes support for all meaningful code models,
including MediumLow, MediumMiddle, MediunAny (both old and new for
backwards compatibility with older GCC versions), and 32bit.
4) Tremendously improved support for instruction level parallelism on
UltraSPARC. Using some new pieces of infrastructure added to
the Haifa scheduler recently, the SPARC backend can now achieve
higher levels of instruction issue per cycle on UltraSPARC than
has ever been possible before.
Common problems previously were in cases where interlocking and
slotting rules of the UltraSPARC could not be accurately described
to the compiler. Here are a few examples:
a) On the UltraSPARC, two integer ALU operations can issue per
cycle. Only one can be a shift, and if the other non-shift
integer instruction is not one which sets the condition codes
then the shift must come first in order to get dual issue.
For example:
add %o0, 1, %o0
srlx %o1, 1, %o1
The old scheduler would think these instructions would issue
together on UltraSPARC, even though it does not. The new
UltraSPARC scheduling support will correctly move the shift
before the add so they do in fact issue together in a single
cycle.
b) In order to get full 4-issue per cycle on UltraSPARC, the
fourth instruction in the instruction group must be either
a conditional branch or a floating point instruction.
At best we previously could only tell the Haifa scheduler that
we had 2 FPU units, and 2 Integer units, yet not special
issuing rules such as this one. So consider a case where 2
integer and 2 FPU instructions could be issued this cycle,
Haifa often would not get full 4 issue such as:
add %o0, 1, %o0
faddd %f0, %f2, %f0
fmuld %f4, %f6, %f4
srlx %o1, 1, %o1
Here Haifa has made two mistakes due to lack of information.
Firstly it missed the "shift/ialu" ordering rule mentioned
above, secondly it placed the integer instruction in the
fourth slot. Haifa would get 3 issue in this case. The new
UltraSPARC scheduling support will instead output:
srlx %o1, 1, %o1
add %o0, 1, %o0
faddd %f0, %f2, %f0
fmuld %f4, %f6, %f4
Which obtains a full 4 issue cycle on UltraSPARC.
5) More efficient switch jump table scheme for PIC code on sparc32.
Previously a single RTL pattern which generated multiple
instructions was used by the SPARC port to load label addresses
when generating position-independent code. Now, jump tables are
output after the function and the label address loading sequence
is described fully in RTL for each sub-operation involved. This
requires not only less instructions, but now all such instructions
are subject to possible instruction and branch delay scheduling.
This work was done by Richard Henderson and
similar techniques will be incorporated into the sparc64 support
code as well.
6) Nearly all operations ever generated by the SPARC backend are
described with a single RTL pattern which generates a single
SPARC instruction. This is important for delay slot and
instruction level scheduling.
7) There are no more hard coded hard registers used in the
RTL generated by the SPARC backend. The only exceptions to
this which remain are unavoidable cases such as for argument
passing semantics and for the PIC base register (the latter can
actually be eliminated, and this is a planned future enhancement).
Hard coded registers in RTL prevent many optimizations, in
particular it prevents many forms of common subexpression
elimination.
8) HIGH and LO_SUM sequences are no longer used to implement
constant formation, even on 32-bit targets. This prevented
the compiler from "seeing" many things about what the instructions
performing the constant formation actually were doing at each
step. Here is an example:
x = 0x12345678;
The SPARC has one instruction which loads the high 22 bits
of a 32-bit constant (without sign extension when considering
64-bit registers) into a register, the rest can be OR'd in
with another instruction. The old compiler would output:
sethi %hi(0x12345678), %o0
or %o0, %lo(0x12345678), %o0
The compiler can't say much about what each instruction does
while optimizing. At best it can say that:
a) The first instruction sets "some unspecified number of
high bits of 0x12345678" into register %o0
b) The second instruction adds "some unspecified number
of low bits of 0x12345678" to register %o0
The new SPARC backend will output code which tells the optimizer
exactly what is going on in each instruction, and also generate
a temporary for the sake of common subexpression detection:
sethi %hi(0x12345400), %o1
or %o1, 0x278, %o0
Now the compiler can clearly see that:
a) After the first instruction, 0x12345400 will be available
in register %o1
b) The second instruction OR's 0x278 into %o1 and leaves the
result (0x12345678) in %o0.
With this knowledge some transformations previously not possible
can and will be performed. Here is a silly example just so you
get the idea:
extern void test1(int x);
int test2(void)
{
test1(0x12345678);
return 0x12345400;
}
The old compiler will output:
test2:
save %sp,-104,%sp
sethi %hi(305419896),%o0
call test1,0
or %o0,%lo(305419896),%o0
sethi %hi(305419264),%i0
ret
restore
And the new one will produce:
test2:
save %sp, -104, %sp
sethi %hi(305419264), %i0
call test1, 0
or %i0, 632, %o0
ret
restore
9) The assembler output is more pretty :-)
Please send FSF & GNU inquiries & questions to gnu@gnu.org. There are also other ways to contact the FSF.
These pages are maintained by the GCC team.
For questions related to the use of GCC, please consult these web pages and the GCC manuals. If that fails, the gcc-help@gcc.gnu.org mailing list might help.Copyright (C) Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110, USA.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.
| Last modified 2006-06-21 |
|