[J-core] Roadmap (was Re: j2 llvm repo?)

Wed Feb 8 20:30:51 EST 2017

On 2/8/2017 3:06 PM, Rob Landley wrote:
> On 02/08/2017 11:43 AM, BGB wrote:
>> On 2/8/2017 3:20 AM, Rob Landley wrote:
>>> We also don't want too many instruction set versions floating around out
>>> there confusing the compiler people, if we can help it. We have
>>> --with-cpu=j2 right now, we'd like to keep versions to a minimum. Should
>>> the FPU have separate 32 bit and 64 bit modes: from a VHDL build
>>> perspective and fitting into less FPGA space, sure why not? From a
>>> toolchain/standardized instruction set perspective: ick, pick one. So
>>> what configuration granularity level we implement beyond the first
>>> release is a judgement call we haven't had to make yet. (There's been
>>> talk of menuconfig in the VHDL build, which would itself be rather a lot
>>> of work to implement...)
>> I guess it could be possible to maybe add support for a subset of the
>> FPU, and then use traps (and emulation) for the rest?
> I don't see any benefit to a "partial" FPU, when both gcc and the kernel
> have had entirely soft FPU implementations on multiple targets for
> years. (If you want soft float, do soft float.)

possible, but a partial FPU could still provide much of the performance 
advantages of a full FPU, with potentially reduced hardware complexity 
(though at the drawback of needing firmware or OS support to emulate the 
missing features).

in contrast, a purely emulated FPU would have have somewhat worse 
performance than having an FPU.

though, yeah, a soft-FPU could be a better option if floating-point 
performance isn't really a high priority.

and, admittedly, I would probably put a working MMU as a higher priority 
than an FPU for most uses.

in terms of the profiler results for my emulator (mostly while running 
Quake), the most highly used FPU operations are:
     FMUL
     FMOV, FMOV.S
     FADD
     FCMPGT
     FSUB
     ...

at least for Quake, FLDS/FLOAT/FTRC are also pretty active (though, a 
lot of things Quake does with floats I would probably do instead via 
integer code).

FDIV is a fair bit further down the list (could probably tolerate being 
emulated).

this implies probably the most gains from FMOV, FMUL, and FADD/FSUB forms.

though, as-is, the majority of the running time (in multiple tests) is 
going mostly into things like the emulated MMU and trace-dispatch logic 
(ex: lots of memory loads/stores and branches).

though a lot of this isn't really because the logic is all that 
complicated, but more the "little things" that start to become an issue 
when a function is called millions of times per second.

( then again, it is currently sufficient to run Quake at around 15-20 
fps (320x240) and 20-25 fps (320x200), so could probably be worse... )

meanwhile, the Dreamcast has a claimed CPU performance of 360 mips, 
which implies a sustained average performance of around 2 instructions 
per clock (well, assuming this isn't an overly optimistic value or 
something).

casually runs Dhrystone:
     in emulator ~72 DMIPS;
     native PC ~17075 DMIPS.

well... suddenly emulator not looking so good here anymore...
(though, still roughly in Pentium 1 territory, so no huge surprise 
there...).

though, this is when getting ~ 93 MIPS in the VM, implying the SH-4 
instructions are worth slightly less than a DMIPS "instruction".

> There are a number of things that can be done to try to fit j3 into a
> smaller FPGA, but multiple implementations split the testing over
> different codepaths and in theory we'd be working towards a J3 ASIC in
> which the LX9 vs LX25 boundary's arbitrary.
>
> We should get turtle out, do a good implementation that fits 2-way SMP
> in LX25, and then worry about trimming down the UP version to fit in
> Numato after we have it working. As soon as we have engineering cycles
> to devote to this, which isn't this week.

ok.