[J-core] [RFC] SIMD extention for J-Core

Tue Oct 31 14:05:13 EDT 2017

On 10/31/2017 1:34 AM, Ken Phillis Jr wrote:
> On Mon, Oct 30, 2017 at 10:11 PM, Rob Landley <rob at landley.net 
> <mailto:rob at landley.net>> wrote:
> > On 10/29/2017 10:37 PM, Ken Phillis Jr wrote:
> >> Information on the SH-3 Is not exactly sparse,
> >
> > We've noticed.
> >
> >> This processor has 3 main versions:
> >> SH-3: The Feature added in this is MMU Instructions, and these are
> >> generally in the System Control Instructions
> >
> > Yeah, that. Except Jeff decided not to use their MMU design because it
> > takes up an unreasonable amount of space in an FPGA (more than doubling
> > the size of the SOC). And we already backported sh3's barrel shift
> > instructions, most of the rest of the instructions they added were for
> > fiddling with the TLB or doing the DSP/FPU stuff.
> >
> > So j3 is adding _an_ MMU, but not necessarily the sh3 mmu. I'll see if
> > we can get more detail posted about the j3 mmu design next month.
> >
> >> SH-3E: the SH-3 Instructions with 32-bit Floating Point Instructions
> >> and registers added.
> >
> > The problem is 32 bit floating point instructions aren't hugely useful,
> > everything interesting's a double. (Even printf("%f") is specced to take
> > a double argument, not a float.) So when we do an FPU, we're most likely
> > to jump straight to the 64 bit version (with maybe a compile time option
> > to strip it down to 32 bits, but we'll see).
> >
>
>
> Not everything is interested in Doubles ( 64-bit floats). I know the 
> C/C++ Library specification is geared to 64-bit floating point, but 
> they do also include definitions for 32-bit floating point values as 
> well.
>
> Anyways, it's important to offer 32-bit floating point for developers 
> to use since 95% of games and most major 3d specs use it...
>

yes, basically true from what I have seen (namely, 32-bit single 
precision being the more dominant type in-use in-practice).

it is possible to have an FPU that does both, eg, by up-converting to 
double internally and then down-converting the result. this is a modest 
cost if one skips over supporting denormals (it is mostly bit-twiddly, 
which is fairly cheap). though, an alternative is having the main "unit" 
have a single/double flag, and unpacking/repacking the output into the 
desired format directly (sparing some details here).

usual advantages to single being that it supports doing a lot more 
numbers without eating as much memory.

for similar reasons, half-float can often be pretty useful (even when 
not supported by hardware), though a case could (possibly) be made for 
an instruction to handle F16 <-> F32 conversion (rather than doing it 
using function calls and explicit bit-twiddly; newer x86 and ARM have 
special instructions for this).

> OpenCL v1.2 - double precision floating point is optional.
> 3D graphics api - generally 32 bit floats are used by programs even 
> though the spec says 64 bit floats.
>
> OpenGL ES 3.2 - essentially single precision floating point only. The 
> word double is only used once.
> OpenGL ES 2.0 - all 32-bit floating point
> OpenVG 1.1 - uses 32 bit floats
>
>
> Bullet physics - this defaults to single precision floats
>
> Box2D physics - I am fairly sure this also uses 32 bit floats.

supporting double probably wouldn't mean a lacking single (in any sane 
world).

OTOH: not supporting double could mean either a performance hit (due to 
emulating it in cases where it is used), or potentially unacceptable 
loss of precision or other issues (in cases where a double is actually 
needed).

there is some uncertainty, for example, when mixing precision in 
expressions. the normal C rules specify a slower but more precise route 
(namely to always promote to double), whereas cheaper (and sometimes 
done by compilers in-practice) is to quietly demote the double to float 
in cases where the result will need to be float (such as assigning the 
result to a float variable).

> >> SH3-DSP Core: SH-3 Instructions with Arithmetic DSP Instructions
> >> added. In general these are mostly for Integer and fixed point math.
> >
> > I'm pretty sure we're not doing that. (I vaguely recall looking at that
> > trying to find j64 instruction space, but those had _already_ been
> > repurposed by later superh processors. I.E. even later superh didn't
> > respect that, and we needed to support the "not that" uses of that
> > instruction space in j64...)
> >
> > I think. Ask me again next week, the notes and people who wrote them are
> > in tokyo. :)
> >
> > (That said, we may wind up adding another simple DSP to the DMA engine.
> > Maybe something 8-bit and capable of driving ethernet checksumming, PTP,
> > handling the mmc bus state engine... But that's not part of historical
> > superh.)
> >
> >> Also, To see a comparison of the SH1, SH2, SH3, and SH4 lines of
> >> chips, you can find the instruction set summary for these at:
> >> HTML: http://www.shared-ptr.com/sh_insns.html 
> <http://www.shared-ptr.com/sh_insns.html>
> >
> > Which is the second link at the top of the j-core.org 
> <http://j-core.org> page, and looking
> > through a printout of that is how I was finding instructions to
> > potentially repurpose for j64 last year. (Which I then pointed the
> > actual engineers at so they could do the real research, I was just
> > finding candidates.)
> >
> >> Github: https://github.com/shared-ptr/sh_insns 
> <https://github.com/shared-ptr/sh_insns>
> >>
> >>
> >> Also, You can find the Programmers manual for the SH3 by searching the
> >> Renesas Website for the SH7705 chip, and looking for the following
> >> Document:
> >> SH-3/SH-3E/SH3-DSP Software Manual
> > See the older japanese gentleman standing next to me in the second
> > picture in https://lwn.net/Articles/647636/ 
> <https://lwn.net/Articles/647636/> wearing a red shirt? A
> > couple decades back, he was the SuperH platform architect. He had
> > _stories_ about SH3 development, and answered a lot of "why did they do
> > X" questions. (Not recently, he's moved on to other things. Retired to
> > California I think? I remember he still considered Microsoft Windows CE
> > compatibility to be important because it was a big customer back when
> > sh2 an sh3 were originally developed, so must still be relevant today
> > because reasons. He made darn sure j-core ran a lot of old Windows CE
> > binaries circa 2014 or so. *shrug* Mostly before my time. Yay
> > compatibility testing I suppose.)
> >
> > Yes, back in the day, wince ran on sh:
> >
> > https://msdn.microsoft.com/en-us/library/ms882059.aspx 
> <https://msdn.microsoft.com/en-us/library/ms882059.aspx>
> >
> > Not currently a development focus of SEI, but if somebody else wanted to
> > do stuff, it's open...
> >

interestingly, although VS2015 seems to lack an SH cross compiler (and 
WinCE+SH is basically EOL'ed), its other tools are apparently still able 
to work with WinCE SH objects/binaries.

was also able to build binutils for WinCE before, but GCC proper also 
seems to have since dropped support (could probably be revived if 
someone wanted to beat on it enough).

though, it is possible a lot of this still exists somewhere "in the 
wild" though.

> > Rob
> >
> > P.S. I'm not the expert on any of this, I'm just chatty. I try to keep
> > up with the mailing list and with what everybody else is doing. we're
> > trying to resurface from the last 18 months of crazy, and when we do
> > I'll see if I can... I dunno, get Jeff to drop into the #j-core irc
> > channel on freenode for half an hour each week or something. Keep in
> > mind he's usually in japan so his day is US night.
>
>
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.j-core.org/pipermail/j-core/attachments/20171031/4c58b9b4/attachment.html>