[J-core] Jcore mailing list and tutrle board

Fri Jul 7 07:06:04 EDT 2017

On Jul 7, 2017, at 3:28, BGB <cr88192 at gmail.com> wrote:

Hi BGB,

It seems like what you are working with is quite a redesign of the ISA.  I wouldn't do that, such approaches usually wind up as personal experiments.  Backward compatibility is the order of the day for an instruction set architecture upgrade.  Something completely different is going to run into adoption problems, and as such not be of much interest to anyone except as an exercise (unless it catches on, and then it's a different ISA).  One thing that is important for any potential 64bit extension is the 32bit mode should be fully compatible with the existing toolchains and OS, meaning the mode switch is well defined and on the fly switching as seamless as possible.

One main reason we started with SHcompact as the ISA is the memory footprint, and the external bus access patterns (not to ignore of all the toolchain and OS infrastructure that was mostly there already).  In embedded systems, memory access patterns and bandwidth is of critical importance.  We have, for instance, massive data flows through the DMA engines in our products and the CPUs have to co-exist usefully using the same DDR memory.

Also keep in mind that as soon as you have 32 bit instruction words, your clean RISC (like) architecture starts to 'degrade' (maybe, my opinion ;).  Much better to have a mode bit in the status register, for instance, and avoid variable length instructions... those can also double your external memory instruction bandwidth (which is important, see above).

So in sum:
- Keep the 32bit mode as is.  Any proposed additions to the ISA that get accepted will be by hard won, on the merits, battles ;)

- Radical ideas about register files are generally a no-no.  SPARC tried register windows (fail) SH2A tried register page ideas (fail)...

- A portable implementation that is also space efficient restricts things to register files of r read ports and w=1 write port.  The area scales roughly with r*w, but also you need last written tag bits and other nastiness when w > 1.  Most modern FPGAs allow turning the basic nLUT logic element into a 1bit x 2^n async RAM primitive.  You can build a lot of good things efficiently with those.  A lot of Open RTL we find doesn't fit that model, and ends up using fabric flops.  But there are only so many of those LUTs in any case.  You don't want to use Block RAM for your reg files in FPGA if you can help it.  It's not impossible, but since they are sync read, it affects the design of your pipeline.

- A 64 bit mode absolutely needs to be designed to have a nice even flow of instructions through the memory interface and cache.

- The reason ditching MOV.W @(PC,disp) etc had an adverse effect is because it's fundamental to constant loading on the SH.  The ISA is designed around that (and other) construct(s)... it was an early design decision, likely before any instructions were defined at all.

- So the best 64bit approach is 'mundane', and predictable, meaning that,

- The fundamental principles of the ISA (e.g. constants are loaded relative to PC, just one such thing) remain the same.  Then things like LEA instructions are not a good fit... since they don't match the methodology.

And above all, it has to be efficiently implementable, in real hardware, of course :)

Cheers,
J.

> On 7/6/2017 8:11 AM, D. Jeff Dionne wrote:
>> On Jul 6, 2017, at 5:58, Rob Landley <rob at landley.net> wrote:
>> 
>> David,
>> 
>> Pls see inline.
>> 
>> J.
>>> On 07/04/2017 02:30 PM, David Summers wrote:
>>>> On 04/07/17 09:35, Rob Landley wrote:
>>>>> On 07/03/2017 02:41 PM, David Summers wrote:
>>>>>> Hi Rob,
>>> ...
>>>>>> Anyway, you probably know the answer to what I was planning to ask
>>>>>> anyway. Why on the turtle boards did you fix on the Spartan FPGA? The
>>>>>> newer Artix is similar price and newer, e.g.:
>> The highest volume product Xilinx ship right now is still S6 in the low cost area, not Artix (I have it on good authority).
>> 
>> We have a bunch of products using S6 on the commercial side, we don't want A7 right now.
>> 
>> Once we make a change of tools from ISE to Vivado, we can make a compatible upgrade to either A7 or S7.
> 
> seems cool.
> 
>>>>>> https://shop.trenz-electronic.de/en/TE0725-03-100-2C-FPGA-Module-with-Xilinx-Artix-7-XC7A100T-2CSG324C-2-x-50-Pin-with-2.54-mm-pitch
>>>>>> 
>>>>>> a 100k gate FPGA for just over a $100.
>> And that is another reason why.  $100 is another class of product.  Turtle design goals had a BoM cost of about $50.
> 
> I am half wondering here if the current FPGAs could handle a version of the ISA with ~5 kbits of register state (~ 160 DWORDs), and a somewhat expanded ISA.
> 
> or:
>    96 dwords for GPRs + system registers;
>        existing registers + high words.
>    64 dwords for FPRs.
>        expanded some to allow 16x128 bit SIMD, also usable as 32 double registers.
>        though, only about 1/2 this space is accessible as 32-bit floats for many operations.
>            unlike SSE, most of the space is usable as independent scalar registers.
> 
> ATM, this is about what the 64-bit version of my ISA is looking like.
> 
> I have recently been working with trying to get a "working prototype" of sorts into working order, but some things are still flexible in the design.
> 
> 
> I have generally determined that trying to make the 32-bit ISA's memory-footprint smaller by adding instructions is of fairly limited effectiveness; so (probably) not a worthwhile tradeoff in the face of expanding the complexity of the instruction set (over the existing SH ISA).
> 
> it is possible to reduce the number of logical operations (by around 30%), though most of this is by replacing multiple 16-bit I-forms with fewer 32-bit I-forms so the overall memory footprint remains pretty similar.
> 
> for the 64-bit ISA, they become a bit more useful, mostly:
>    to be able to actually do 64-bit stuff with the way I am currently implementing it;
>        most 64-bit operations will require dedicated quadword forms, ex: "ADD.Q" / etc.
>    to mostly avoid a significant expansion of the code footprint.
>        as-is, it looks like the footprint expansion from 32-bit to 64-bit will be fairly modest.
>            at-least, if most pointer arithmetic is done with dedicated LEA instructions.
> 
> some changes (of the 64-bit ISA) from the prior design:
>    most of the 16-bit instruction forms behave as they do in 32-bit mode.
>        exceptions are LDC.L/STC.L/... which expand to 64-bit forms.
>            also MOV.{B/W/L} ops expand to using 64-bit addresses.
>        determined that full-scale extension to 64-bit arithmetic would be detrimental.
>            these is far more 32-bit integer arithmetic than pointer arithmetic / etc.
>    16-bit MOV.W forms are left as-is.
>        most MOV.Q variants will require 32-bit I-forms.
>        losing "MOV.W @(PC,disp)" turned out to have more of an adverse effect than expected.
>    added 16-bit PUSH/POP ops.
>        but, dropped a few previously spec'ed ops as they became unnecessary/redundant.
>    the number of usable 32-bit GPRs is expanded to 32 (by using the low/high halves separately).
>        in addition, there are some 3-address arithmetic instructions and similar.
>        ...
> 
> the 64-bit ISA still isn't really done yet, and my early testing thus far has been in sort of a hybrid mode (some quadword stuff but still otherwise using a 32-bit address space).
> 
> also, note that 64 bit arithmetic/registers would be accessible by 32-bit code, but I will probably define doing this as "non-canonical".
> 
> but, it goes on... (and hopefully the design is viable).
> 
> <snip rest, not much to add>
> 
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core