[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [openrisc] Re: OR1000 16 bit instruction set
On Wednesday 20 February 2002 07:29, Andreas Bombe wrote:
> List ate my email - this is a resend.
>
> On Thu, Feb 14, 2002 at 08:43:44PM +0100, Damjan Lampret wrote:
> > Hi Jeff,
> >
> > it is all about demand. Right now it looks like 32-bit insn
> > length is most wanted, especially because of the coming 64-bit
> > superscalar version of the OR1K. It is also true that prices of
> > Flash and RAM are droping some 30% or more each year. But if
> > there will be enough interest for 16-bit insn length, it could
> > become a priority.
>
> "Whatever, RAM is cheap these days" is the usual answer, but
> usually cost per MB is not the issue. It is the wrong answer on
> desktop systems (because RAM is slow) and usually wrong in low cost
> embedded systems.
I have worked on several CPUs that support "duality" of opcode
lengths (i.e. supporting both 16bit and 32bit ISAs), both ARM and
MIPs. In theory 16bit instructions are great, in practice they
usually turn out to be a big fat pain in the rear side. To me it's
another "me too" feature that all the embedded CPUs had to support to
get customer attention.
That being said, when the datapath is constricted it does have an
advantage..., keep in mind "constricted" is more than just buswidth
differences, but also relates to wait states on 32bit wide busses.
The greater advantage is actually instruction cache expansion.
Statistically, using 16bit instructions reduces the size of the code
by 30%. That results in a large increase in "effective" instruction
cache size.
However, the issues in the tools with 16bit instructions have never
been properly addressed. The idea I had while considering this some
time ago is adding a "multi op" instruction that has a 4 bit
pre-amble and can do 3 register to register arithmetic operations.
It couldn't access all the registers clearly, but the way it would
work is you would use the "normal" instructions to load the
arithmetic registers (say 0-3), and then use the "multop" instruction
to do the heavy lifting of those calculations. This gets rid of all
the issues involved with having multiple ISAs and gives you most of
the advantages of 16bit instructions in terms of code size based on
my calculations.
Consider the difference in ARM vs. this idea-
lui r0, #0x0123456 @lui has 28 bits of expresiveness in
lui r1, #0x9123456 @Shanes Make Believe ISA ;-)
{ ori r0 #0x7, ori r1 #0x7, add r0, r1}
/* The add op places the result in the first mentioned register, so
the result would be r0=0x92468ACE , r1=0x91234567*/
(Full add register to register with a 32bit insn size, 32bit register
size total code and data- 32bit*3=96bit. )
In ARM this would be 3 instructions and 2 32bit data pieces, so
160bits, not to mention the cache hit normally involved with ldr.
ldr r0, =0x01234567
ldr r1, =0x01234567
add r0, r0, r1
[0x01234567]
[0x91234567]
Thanks,
Shane Nay.
(I have the entire ISA written out for both a 32bit and 64bit version
of a multi-op processor, one of these days I'll finish learning
Verilog :). Using a multi-op strategy you can actually end up with a
64bit instruction length processor that has more compact code than
it's 32bit brothers)
--
To unsubscribe from openrisc mailing list please visit http://www.opencores.org/mailinglists.shtml