[J-core] Porting J1 to LiteX
William D. Jones
wjones at wdj-consulting.com
Tue Jun 15 07:15:12 UTC 2021
Hi Jeff,
Good to hear from you again!
> But does it fit reasonably? I had found that J1 was impractical in
HX, and so did not investigate further.
I guess it depends on what you mean by "reasonably"- LUT usage or RAM usage.
The entire SoC uses ~5300 LUTs on HX8K vs ~4300 LUTs on UP5K. This
leaves room for some peripherals like LEDs, GPIO, and UART, XIP from SPI
flash, and cache.
However, since 8K doesn't have SPRAM, I reduced the amount of "bulk" RAM
to 1Kb. This is still enough to run the bootrom, and there's still a
good 25% of the EBR (4kB) left unused. The smallest e.g. ARM
microcontrollers had like 4kB flash and 1kB of RAM (LPC810), and I am
partial to msp430 (some of them have 128 bytes of RAM). So I'm very
tolerant of microcontrollers w/ limited memory :D!
Based on prior experience targeting Micropython to 8k parts, I think 8k
support is worth exploring at this point. Using j1 to its full power w/
e.g. Micropython on HX8K will most likely require a small icache and SPI
Flash XIP. This configuration w/ lm32 on TinyFPGA halves execution time
compared to without; we can probably go without a dcache. And even if
it's not ideal, we can use j1 until j0 is ready :D!
If you want to duplicate my results and see LUT/EBR uages, use my copy
of j1 (https://github.com/cr1901/jcore-j1-ghdl/tree/hx8k) and run "make
TARGET=ice40hx8k_b_evn". LiteX is using my own copy of j1 for now, just
in case I need to make changes and experiment; this is temporary.
>for now another place to look is here:
https://github.com/j-core/j-core-ice40/tree/master/testrom
<https://github.com/j-core/j-core-ice40/tree/master/testrom> which while
less clean, is closer to what you'll need.
Ack. Where is the script/program you use to convert the testrom to a
VHDL array? I think I'd rather reuse yours for now than write my own.
> keep in mind that J1 is still a full Harvard machine, so you'll need
to mux it down to a single master.
LiteX provides its own mux on the Wishbone bus, so I would adapt both
the D and I buses to Wishbone before the mux.
> Yes, there are a few proposals... just microcode, only SH1
instructions (no MAC or other DSP functions). We've also just not had
the focus.
Ack. This is a proof-of-concept for now. Once the CPU is in LiteX and
working, the hardest part is done, and we can iterate to make the
integration better.
> I don't think timing closure, on FPGA multipliers tend to be very
fast. There are a few critical paths, the one that erks me is the T bit
feeding into the microcode sequencer. But when I wrote the MAC unit, it
was very clean, even if it's picked up a bit of cruft since.
Ack. One thing I'd like to add: nextpnr has trouble routing the up5k
version of the SoC, even with the DSPs and SPRAM relieving about 1k LUTs
for other use. nextpnr can take upwards of 5 minutes to route on up5k,
and by changing the PCF, I could get nextpnr to take over 10 minutes to
route before I cancelled it. I'll ask one of the nextpnr devs for some
insight.
> IIRC, some are instruction chewers. J1 is a highly encoded and more
complex operation per instruction ISA, and pipelined machine with
parallel ALU, MAC and shift units. The throughput might be comparable,
even at a slower clock :)
The time it takes to checksum the main payload in LiteX BIOS may be a
good benchmark.
Sincerely,
--
William D. Jones
wjones at wdj-consulting.com
More information about the J-core
mailing list