[J-core] Roadmap (was Re: j2 llvm repo?)
cr88192 at gmail.com
Wed Feb 8 20:30:51 EST 2017
On 2/8/2017 3:06 PM, Rob Landley wrote:
> On 02/08/2017 11:43 AM, BGB wrote:
>> On 2/8/2017 3:20 AM, Rob Landley wrote:
>>> We also don't want too many instruction set versions floating around out
>>> there confusing the compiler people, if we can help it. We have
>>> --with-cpu=j2 right now, we'd like to keep versions to a minimum. Should
>>> the FPU have separate 32 bit and 64 bit modes: from a VHDL build
>>> perspective and fitting into less FPGA space, sure why not? From a
>>> toolchain/standardized instruction set perspective: ick, pick one. So
>>> what configuration granularity level we implement beyond the first
>>> release is a judgement call we haven't had to make yet. (There's been
>>> talk of menuconfig in the VHDL build, which would itself be rather a lot
>>> of work to implement...)
>> I guess it could be possible to maybe add support for a subset of the
>> FPU, and then use traps (and emulation) for the rest?
> I don't see any benefit to a "partial" FPU, when both gcc and the kernel
> have had entirely soft FPU implementations on multiple targets for
> years. (If you want soft float, do soft float.)
possible, but a partial FPU could still provide much of the performance
advantages of a full FPU, with potentially reduced hardware complexity
(though at the drawback of needing firmware or OS support to emulate the
in contrast, a purely emulated FPU would have have somewhat worse
performance than having an FPU.
though, yeah, a soft-FPU could be a better option if floating-point
performance isn't really a high priority.
and, admittedly, I would probably put a working MMU as a higher priority
than an FPU for most uses.
in terms of the profiler results for my emulator (mostly while running
Quake), the most highly used FPU operations are:
at least for Quake, FLDS/FLOAT/FTRC are also pretty active (though, a
lot of things Quake does with floats I would probably do instead via
FDIV is a fair bit further down the list (could probably tolerate being
this implies probably the most gains from FMOV, FMUL, and FADD/FSUB forms.
though, as-is, the majority of the running time (in multiple tests) is
going mostly into things like the emulated MMU and trace-dispatch logic
(ex: lots of memory loads/stores and branches).
though a lot of this isn't really because the logic is all that
complicated, but more the "little things" that start to become an issue
when a function is called millions of times per second.
( then again, it is currently sufficient to run Quake at around 15-20
fps (320x240) and 20-25 fps (320x200), so could probably be worse... )
meanwhile, the Dreamcast has a claimed CPU performance of 360 mips,
which implies a sustained average performance of around 2 instructions
per clock (well, assuming this isn't an overly optimistic value or
casually runs Dhrystone:
in emulator ~72 DMIPS;
native PC ~17075 DMIPS.
well... suddenly emulator not looking so good here anymore...
(though, still roughly in Pentium 1 territory, so no huge surprise
though, this is when getting ~ 93 MIPS in the VM, implying the SH-4
instructions are worth slightly less than a DMIPS "instruction".
> There are a number of things that can be done to try to fit j3 into a
> smaller FPGA, but multiple implementations split the testing over
> different codepaths and in theory we'd be working towards a J3 ASIC in
> which the LX9 vs LX25 boundary's arbitrary.
> We should get turtle out, do a good implementation that fits 2-way SMP
> in LX25, and then worry about trimming down the UP version to fit in
> Numato after we have it working. As soon as we have engineering cycles
> to devote to this, which isn't this week.
More information about the J-core