[J-core] [RFC] SIMD extention for J-Core

Fri Oct 27 20:48:04 EDT 2017

On Fri, Oct 27, 2017 at 3:37 PM, Rob Landley <rob at landley.net> wrote:
> On 10/26/2017 08:00 PM, Cedric BAIL wrote:
>>     > SIMD Configuration Instructions:
>>     >
>>     > SIMD.IMODE - This configures the Integer Math mode of the Integer SIMD
>>     > Operations. The accepted modes should include the following:
>>     > * Integer Carry Mode - See ADDC and SUBC for example.
>>     > * Value UnderFlow and OverFlow Mode - See ADDV and SUBV  for examples.
>>     > * Integer Type: Signed and Unsigned values with sizes of 8-bit,
>>     > 16-bit, 32-bit, and 64-bits.
>>
>>     We're really short on instruction space. We fit j64 in, but had to
>>     repurpose several existing instructions in 64 bit mode to do it.
>>
>> For the needed addition, as they will not be used very often, having a
>> prefix that enable a new instruction set temporary would limit the
>> problem. Even then, the instruction space is quite limited and adding an
>> instruction require careful thinking and testing different scenario.
>
> Currently there are no prefixes. All the instructions are self-contained
> and fixed length. That's one of the main advantages of the instruction
> set. The way we do mode shifts is with control register bits.

Well, prefix can be of fixed length too. Like one 16bits instruction
that turn the cpu in a specific instruction space (Either for just one
instruction or multiple). I guess it would work the same if there was
a control register to go into that specific instruction space. It just
is a matter of how costly it is to go in an out and if you can
pipeline it nicely.

> In theory we could have a transient control register mask that gets
> xored with the persistent control register but all its bits are zeroed
> after some event (one instruction, next jump, etc). So you could have
> something prefix-like, although it would probably make multi-issue more
> complicated. But that's something that would have to be designed and
> justified and isn't part of the current stuff so there's backwards
> compatibility issues...
>
> I.E. that's j64 territory. Right now j3 and j4 are _mostly_ specced as
> following the path laid down by Hitachi 20 years ago. Two of the 3 "new"
> j-core instructions are backported from j3. We regression test against
> old sh2 binaries. We care that this is part of a _family_. (The "born
> out of wedlock" jokes are left as an exercise for the reader.)

It seems by reading you, that j3 and j4 are more specced than I had
imagine. My assumption so far was j3 would be j2 + MMU. j4 being kind
of question mark. Superscalar ? FPU ?

> After sh4 Hitachi handed off the technology to a new company but kept
> the design engineers, and the new team did a brand new sh5 instruction
> set that nobody was interested in, and that's the point where we need to
> forge our own path. So j64 introduces new design elements, although or
> model is "what x86-64 did to x86, we're trying to do to shmobile".

So my understanding here is that you do not want to take the road of a
family with instructions being optional. Instead you have clearly
incremental defined step. j64 would be the time when you introduce
SIMD. So j64 would be j2 + MMU + FPU + SIMD + 64bits. Not something
where you had a j-core with configurable option and you would have the
possibility to do j2 + SIMD or j2 + 64bits for example. It does sound
like it will reduce the complexity of the source code and is better
for long term maintenance.

> So we _were_ discussing SIMD in context of j64. we're not sure whether
> we need to move it sooner, or just bear down and get j64 out ASAP. (We'd
> love to do the latter, we're working out how to clear engineering time
> for it...)

I am playing with SIMD ideas in the j2 case. It takes time (on my
spare time) to build an assembler and after than an emulator that
allow rapid prototyping of it. Still the reason why I am playing in
that land and think there is potential for it, is that I think it
would be able to drive eink devices for a large variety of usage. This
would replace the usually terrible chip that come with them this days.

Now, going with a j64 for that scenario, would open a lot more use
case. Instead of just driving the screen, you could start looking at
webpage rendering (which is sadly the format of a lot of ebook). It
would clearly be an overkill for simple usecase, like electronic tag,
but the question is how much of a waste is it ? How much bigger would
it be compared to a j2 ?

> j64 has a control register bit to switch between 32 bit and 64 bit
> modes. 32 bit mode is scrupulously compatible with j4. 64 bit mode is
> designed around the idea you're probably going to drop back to 32 bit
> mode to run 32 bit code, which needs to preserve the high bits of the
> registers unchanged and so on...
>
> We're going to try to put together a better j64 proposal in November,
> with actual details. (I haven't been back to tokyo since last November,
> but my next flight there leaves tuesday morning. There's a backlog of
> sitting down with people and writing up documentation...)

That is so way more early than I had expected. Pretty big news that
you are casually dropping here :-) Looking forward to it.

>> One of the thing that killed the performance of early simd cpu was the
>> lack of efficient way of shuffling bits around. I think there is still
>> potentially patent on power pc shuffling instruction, but something that
>> allow rotation by word step would go a long way (again that can be in
>> the vector instruction space).
>
> That is totally a Jeff question. I tried to explain what I remembered of
> previous talks with him, and it turns out I don't remember enough. :)
-- 
Cedric BAIL