[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [openrisc] Re: OR1000 16 bit instruction set



> I have worked on several CPUs that support "duality" of opcode
> lengths (i.e. supporting both 16bit and 32bit ISAs), both ARM and
> MIPs.  In theory 16bit instructions are great, in practice they
> usually turn out to be a big fat pain in the rear side.  To me it's
> another "me too" feature that all the embedded CPUs had to support to
> get customer attention.
right. I've never work with Thumb code, but IMHO it is not worth
to use it, unless you don't have a cache.

> However, the issues in the tools with 16bit instructions have never
> been properly addressed.  The idea I had while considering this some
> time ago is adding a "multi op" instruction that has a 4 bit
> pre-amble and can do 3 register to register arithmetic operations.
> It couldn't access all the registers clearly, but the way it would
> work is you would use the "normal" instructions to load the
> arithmetic registers (say 0-3), and then use the "multop" instruction
> to do the heavy lifting of those calculations.  This gets rid of all
> the issues involved with having multiple ISAs and gives you most of
> the advantages of 16bit instructions in terms of code size based on
> my calculations.
If you are considering new architecture you should also consider
stack model and software conventions as well. As I was studying
different models it turned out, SW is really dependant on classical
RISC/CISC architectures. If you want to make your processor
useful, you must make it easy for software guys.

I did not quite understand your ISA, but I have got an opinion, you
are trying to make nearly 0-operand cpu. You should also consider
how your loops would be optimized.

The 'theoretical' code minimum is about 20% of OR32 code size; but this
would involve real compressor (e.g. gzip) with long latencies. If you want
to have solution like ARM Thumb you end up with 50% of OR32 code size.
Below this it is really hard to go, without significantly reducing
performace
(e.g. stack computer 30% OR32 code size).

BTW:
if anybody is interested in following statistics done recently on ucLinux
code (for or32):

The table below shows how long should  immediates be. Middle column shows
coverage
in % for given number of bits (left), and number of instructions are listed
on the right.
/ delimiters arithm/jump instructions.
(all immediates are treated as signed)

 0:       0% /  14%         0 / 11443
 1:       0% /  15%         7 /  1428
 2:       1% /  17%       657 /  1764
 3:       3% /  19%       838 /  1367
 4:       7% /  22%      1760 /  2568
 5:      10% /  26%      1475 /  3594
 6:      13% /  31%      1153 /  3787
 7:      15% /  34%       893 /  2777
 8:      16% /  36%       682 /  2036
 9:      17% /  37%       496 /   861
10:      19% /  38%       579 /   855
11:      22% /  39%      1329 /   380
12:      24% /  39%       959 /   136
13:      26% /  39%       817 /   334
14:      28% /  40%      1034 /    86
15:      31% /  40%      1212 /   485
16:      34% /  41%      1423 /   930
17:      37% /  42%      1633 /  1016
18:      71% /  66%     15499 / 20144
19:      72% /  67%       320 /   334
20:      72% /  67%        53 /   219
21:      75% /  67%      1362 /   407
22:      75% /  68%        71 /   157
23:      75% /  68%        61 /    29
24:      77% /  68%       728 /   136
25:      81% /  74%      1850 /  5156
26:      81% /  74%        49 /    66
27:      82% /  75%       188 /   783
28:      89% /  92%      3449 / 14466
29:      89% /  93%        76 /   376
30:      97% /  93%      3611 /   526
31:      98% /  95%       262 /  1387
32:     100% / 100%       996 /  4381


best regards,
Marko


--
To unsubscribe from openrisc mailing list please visit http://www.opencores.org/mailinglists.shtml