<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 10/26/2017 8:59 AM, Christopher
Friedt wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">
<div dir="auto"><br>
<div class="gmail_extra" dir="auto"><br>
<div class="gmail_quote">On Oct 26, 2017 9:04 AM, "emanuel
stiebler" <<a href="mailto:emu@e-bbes.com"
target="_blank" moz-do-not-send="true">emu@e-bbes.com</a>>
wrote:<br type="attribution">
<blockquote class="m_8244701099308911606quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div class="m_8244701099308911606quoted-text">On
2017-10-25 17:59, Ken Phillis Jr wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
New CPUID Flags:<br>
SIMD_INTEGER8<br>
SIMD_INTEGER16<br>
SIMD_INTEGER32<br>
SIMD_INTEGER64<br>
SIMD_HALF_PRECISION_FLOAT<br>
SIMD_SINGLE_PRECISION_FLOAT<br>
SIMD_DOUBLE_PRECISION_FLOAT<br>
<br>
</blockquote>
<br>
</div>
Just a short one,<br>
for the floating points, I would prefer<br>
<br>
SIMD_FLOAT16<br>
SIMD_FLOAT32<br>
SIMD_FLOAT64<br>
SIMD_FLOAT128<br>
</blockquote>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">I would suggest shortening those further.</div>
<div dir="auto"><br>
</div>
<div dir="auto">simd.u8</div>
<div dir="auto">simd.s16</div>
<div dir="auto">simd.f32</div>
<div dir="auto">simd.f16</div>
<div dir="auto"><br>
</div>
</div>
</blockquote>
<br>
yeah, we don't need Double/Float64 SIMD.<br>
<br>
Float128 would be pretty absurd given no hardware currently does it
natively (also it would be crazy expensive, a full-width MAC unit
quite possibly wouldn't fit into the FPGA...).<br>
<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">
<div dir="auto">
<div dir="auto">* dot notation is a bit nicer</div>
<div dir="auto">* u := unsigned</div>
<div dir="auto">* s := signed</div>
<div dir="auto">* f32 := IEEE 754 32-bit float</div>
<div dir="auto">* f16 := half precision</div>
<div dir="auto">* similarly, f64, u64, ...</div>
<div dir="auto"><br>
</div>
<div dir="auto">BGB - could you mention on the list how the FPU
design differs between SH and x86/mmx ?</div>
<div dir="auto"><br>
</div>
</div>
</blockquote>
<br>
x87 had 8x 80-bit FPU registers organized as a stack; so you
couldn't perform operations between arbitrary registers, but more
push/pop/swap/...<br>
<br>
MMX allowed using them instead as 8x 64-bit registers, but doing so
required entering/exiting MMX mode, which would basically trash the
FPU registers.<br>
<br>
SSE basically reduced this mess, and since then compilers have
mostly abandoned x87.<br>
<br>
<br>
the SH style FPU uses two banks of 16x 32-bit registers (labeled
FR0..FR15 and XF0..XF15; though some SH variants only have a single
bank);<br>
operations on these registers are basically direct register
operations (more like GPRs, or like the more modern x86-64 strategy
of doing all the FPU stuff using SSE);<br>
the SH FPU also implements Double operations by working on pairs of
registers (kind of funky, but works).<br>
<br>
so, unlike x87, there is no metadata and no stack, and likewise no
need to clear the registers if moving between SIMD mode and FPU mode
(if desired).<br>
likewise, by sharing the register state, it would make it possible
to do arbitrary scalar operations within vector elements, which is
not something readily supported by SSE.<br>
<br>
a partial limitation is due to how the "bank" mechanism works,
though, which would hinder doing operations if doing 128-bit vectors
between the high and low halves of a vector with 16-bit instructions
(my SIMD extensions though would remedy this by allowing 32-bit FPU
I-forms to access 32 float registers, rather than only banks of 16
registers; with 8x SIMD registers this basically meant direct access
to the entire space).<br>
<br>
likewise, the space can also be used as 16x 64bit vectors, given the
vectors themselves are implemented via register pairs (and in this
case, equivalent to the Double registers).<br>
<br>
<br>
a minor extension to the FPU is that Double operations may access
all 16x Double registers at the same time;<br>
likewise "borrowed" the feature allowing both SZ and PR bits to be
set, allowing moving Doubles to/from memory in the correct order
(usually these require doing the loads/stores as pairs of
instructions).<br>
<br>
sadly also, the FPU design requires frequently togging bits in the
FPSCR register (to move between Float/Double), but I added an op to
load the relevant bits in the register directly (vs, say, needing to
do a constant load and then shove the desired FPU state into this
register).<br>
<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">
<div dir="auto">
<div dir="auto">Having ported FFTW over to ARM neon I'm
extensively familiar with it and know it is strikingly similar
to mmx. I know that (at least on ARM) the contention you've
mentioned is quite significant. Pipeline stalls must be
precisely inserted to ensure correct results for simd
instructions are obtained at the correct times, etc. Vector
loads and stores, cache-prefetching, and i/o alignment were
critical.</div>
<div dir="auto"><br>
</div>
<div dir="auto">For A8 that meant fine-tuning the instructions
generated by the compiler. There were also some memory
barriers, iirc. Mostly hand-written assembly. Intrinsics were
~ meh.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I did work on it before the A9 OOO pipeline was
introduced, but also know that having an out-of-order unit
helped simd on ARM.<br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Again, I'm really curious how the FPU design
differs, because if SH / J-Core can avoid that mess, it would
be better off.</div>
<div dir="auto"><br>
</div>
</div>
</blockquote>
<br>
sadly, basically out of time, so no response.<br>
<br>
<blockquote type="cite"
cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">
<div dir="auto">
<div dir="auto"><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
J-core mailing list
<a class="moz-txt-link-abbreviated" href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a>
<a class="moz-txt-link-freetext" href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a>
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>