[J-core] [RFC] SIMD extention for J-Core

Thu Oct 26 09:59:43 EDT 2017

On Oct 26, 2017 9:04 AM, "emanuel stiebler" <emu at e-bbes.com> wrote:

On 2017-10-25 17:59, Ken Phillis Jr wrote:

New CPUID Flags:
> SIMD_INTEGER8
> SIMD_INTEGER16
> SIMD_INTEGER32
> SIMD_INTEGER64
> SIMD_HALF_PRECISION_FLOAT
> SIMD_SINGLE_PRECISION_FLOAT
> SIMD_DOUBLE_PRECISION_FLOAT
>
>
Just a short one,
for the floating points, I would prefer

SIMD_FLOAT16
SIMD_FLOAT32
SIMD_FLOAT64
SIMD_FLOAT128

I would suggest shortening those further.

simd.u8
simd.s16
simd.f32
simd.f16

* dot notation is a bit nicer
* u := unsigned
* s := signed
* f32 := IEEE 754 32-bit float
* f16 := half precision
* similarly, f64, u64, ...

BGB - could you mention on the list how the FPU design differs between SH
and x86/mmx ?

Having ported FFTW over to ARM neon I'm extensively familiar with it and
know it is strikingly similar to mmx. I know that (at least on ARM) the
contention you've mentioned is quite significant. Pipeline stalls must be
precisely inserted to ensure correct results for simd instructions are
obtained at the correct times, etc. Vector loads and stores,
cache-prefetching, and i/o alignment were critical.

For A8 that meant fine-tuning the instructions generated by the compiler.
There were also some memory barriers, iirc. Mostly hand-written assembly.
Intrinsics were ~ meh.

I did work on it before the A9 OOO pipeline was introduced, but also know
that having an out-of-order unit helped simd on ARM.

Again, I'm really curious how the FPU design differs, because if SH /
J-Core can avoid that mess, it would be better off.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.j-core.org/pipermail/j-core/attachments/20171026/9c788fe7/attachment-0001.html>