[J-core] [RFC] SIMD extention for J-Core

Thu Oct 26 20:10:41 EDT 2017

On Thu, Oct 26, 2017 at 6:14 PM, Rob Landley <rob at landley.net> wrote:
> On 10/25/2017 06:59 PM, Ken Phillis Jr wrote:
>> I figure I will include my Idea for an easy to use SIMD Extension for
>> the J-Core. In general this extension will be used for Three Types of
>> number formats...
>
> One of the blue sky J64 proposals Jeff mentioned (let me see if I can
> remember the details) was 2 control bits per register letting you
> specify that register contains 8/16/32/64 bit SIMD, meaning a single 32
> bit control register could control SIMD state for 16 general purpose
> registers. Then you use the normal operations to deal with
> signed/unsigned, multiply, and so on.
>
> The next question was how to deal with source/target size mismatches: in
> theory a 64 bit source iterating over 8 bit targets could apply a single
> 64 bit source value to all 8 targets, a 32 bit source could go
> 1-2-3-4-1-2-3-4, etc. In practice what that makes the ALU look like is a
> question that needs some code prototyping I think...
>
>> New Registers: simd0 to simd15
>> These Registers are 128-bits in size, and are used to perform a bulk
>> of the SIMD Math.
>
> Is there a way to do that and _not_ quadruple the size of the processor?
>
Yes, by doing two tasks...

Task 1: Add a single temp register to free general purpose register
r15 from it's use as a stack pointer. Programs with SIMD Capabilities
will make use of this new register instead to store stack information.

Task 2: Implement the Simd Registers by reusing the Existing Registers...

For 32-bit Floats:
simd0 maps to FV0 on bank 0
simd1 maps to FV4 on bank 0
simd2 maps to FV8 on bank 0
simd3 maps to FV12 on bank 0
simd4 maps to FV0 on bank 1
simd5 maps to FV4 on bank 1
simd6 maps to FV8 on bank 1
simd7 maps to FV12 on bank 1

For 64-bit Floats:
simd0 contains DR0 and DR1
simd1 contains DR2 and DR3
simd2 contains DR4 and DR5
simd3 contains DR6 and DR7
simd4 contains DR8 and DR9
simd5 contains DR10 and DR11
simd6 contains DR12 and DR13
simd7 contains DR14 and DR15

For extended Simd, we can gain more simd registers by re-using the
General Precision Registers...

Extended SIMD Registers on 32-bit chips:
simd8 is r0 to r3
simd9 is r4 to r7
simd10 is r9 to r11
simd11 is r12 to r15
simd12 to simd 15 - I do not think these will be used in 32-bit mode,
but I doubt most users will need this on 32-bit prograare not
available on 32-bit chips.

For 64-bit Capable chips where the General Purpose registers can store
64-bit values, we can make use of the whole General Purpose Register
space for use in SIMD Math.
simd8 is r0 and r1
simd9 is r2 and r3
simd10 is r4 and r5
simd11 is r6 and r7
simd12 is r8 and r9
simd13 is r10 and r11
simd14 is r12 and r13
simd15 is r14 and r15.

>> SIMD Configuration Instructions:
>>
>> SIMD.IMODE - This configures the Integer Math mode of the Integer SIMD
>> Operations. The accepted modes should include the following:
>> * Integer Carry Mode - See ADDC and SUBC for example.
>> * Value UnderFlow and OverFlow Mode - See ADDV and SUBV  for examples.
>> * Integer Type: Signed and Unsigned values with sizes of 8-bit,
>> 16-bit, 32-bit, and 64-bits.
>
> We're really short on instruction space. We fit j64 in, but had to
> repurpose several existing instructions in 64 bit mode to do it.
>
>> Data Loading/Conversion Instructions:
>> Bulk Conversion From Integers to Floats, and Floats to integers is a
>> must. That said, I'm not exactly sure how many Instructions are needed
>> for this, but It would be reasonable to say that four to seven
>> instructions may be required.
>
> In Jeff's design you'd copy from register in one mode to register in
> another mode, then do a rotate by enough bits you could do the next one,
> and a rotate at the end if you needed to restore the register? (Hmmm,
> that sounds like it would need an xor on the mode thing between those so
> the rotate wasn't within the simd division? Possibly I misunderstood, it
> was a while ago...)
>
> Rob

The Purpose of the general purpose LogicaI shifts in simd Math is to
vectorize code where binary shifts are required. Two great examples of
places where this type of math happens is cryptography, and fixed
point math. Although there is probably other cases I have not thought
of for now.

https://felix.abecassis.me/2012/08/sse-vectorizing-conditional-code/