[J-core] [RFC] SIMD extention for J-Core

Ken Phillis Jr kphillisjr at gmail.com
Thu Oct 26 16:12:42 EDT 2017


On Thu, Oct 26, 2017 at 8:59 AM, Christopher Friedt
<chrisfriedt at gmail.com> wrote:
>
>
> On Oct 26, 2017 9:04 AM, "emanuel stiebler" <emu at e-bbes.com> wrote:
>
> On 2017-10-25 17:59, Ken Phillis Jr wrote:
>
>> New CPUID Flags:
>> SIMD_INTEGER8
>> SIMD_INTEGER16
>> SIMD_INTEGER32
>> SIMD_INTEGER64
>> SIMD_HALF_PRECISION_FLOAT
>> SIMD_SINGLE_PRECISION_FLOAT
>> SIMD_DOUBLE_PRECISION_FLOAT
>>
>
> Just a short one,
> for the floating points, I would prefer
>
> SIMD_FLOAT16
> SIMD_FLOAT32
> SIMD_FLOAT64
> SIMD_FLOAT128
>
>
> I would suggest shortening those further.
>
> simd.u8
> simd.s16
> simd.f32
> simd.f16
>
> * dot notation is a bit nicer
> * u := unsigned
> * s := signed
> * f32 := IEEE 754 32-bit float
> * f16 := half precision
> * similarly, f64, u64, ...
>


I was realistically thinking the items I mentioned would work like the
CPUID Instruction on x86 where the seven items are mapped to a single
Bit on a built-in read only table on chip that is accessed through the
cpuid instruction.

Table id: TBD - This Table entry covers SIMD Information.
bit 0 to 7 - Integer Size support where bit 0 is for 8-bit integers,
bit 1 is 32 bit integers, etc.
bit 8 - Reserved.
bit 9 to 15 - Floating Point Size support - Bit 9 is 16-bit floats,
bit 10 is 32-bit floats, etc.
bit 16 to 31 - Reserved for future use.

> BGB - could you mention on the list how the FPU design differs between SH
> and x86/mmx ?
>
> Having ported FFTW over to ARM neon I'm extensively familiar with it and
> know it is strikingly similar to mmx. I know that (at least on ARM) the
> contention you've mentioned is quite significant. Pipeline stalls must be
> precisely inserted to ensure correct results for simd instructions are
> obtained at the correct times, etc. Vector loads and stores,
> cache-prefetching, and i/o alignment were critical.
>
> For A8 that meant fine-tuning the instructions generated by the compiler.
> There were also some memory barriers, iirc. Mostly hand-written assembly.
> Intrinsics were ~ meh.
>
> I did work on it before the A9 OOO pipeline was introduced, but also know
> that having an out-of-order unit helped simd on ARM.
>
> Again, I'm really curious how the FPU design differs, because if SH / J-Core
> can avoid that mess, it would be better off.
>
>


More information about the J-core mailing list