[J-core] Porting J1 to LiteX

Tue Jun 15 21:46:47 UTC 2021

Ignore previous message- it is the same as a message I previously sent; 
mistake on my end.

On 6/15/2021 5:45 PM, William D. Jones wrote:
> Hi Jeff,
>
> Good to hear from you again!
>
>> But does it fit reasonably?  I had found that J1 was impractical in 
> HX, and so did not investigate further.
>
> I guess it depends on what you mean by "reasonably"- LUT usage or RAM 
> usage.
>
> The entire SoC uses ~5300 LUTs on HX8K vs ~4300 LUTs on UP5K. This 
> leaves room for some peripherals like LEDs, GPIO, and UART, XIP from 
> SPI flash, and cache.
>
> However, since 8K doesn't have SPRAM, I reduced the amount of "bulk" 
> RAM to 1Kb. This is still enough to run the bootrom, and there's still 
> a good 25% of the EBR (4kB) left unused. The smallest e.g. ARM 
> microcontrollers had like 4kB flash and 1kB of RAM (LPC810), and I am 
> partial to msp430 (some of them have 128 bytes of RAM). So I'm very 
> tolerant of microcontrollers w/ limited memory :D!
>
> Based on prior experience targeting Micropython to 8k parts, I think 
> 8k support is worth exploring at this point. Using j1 to its full 
> power w/ e.g. Micropython on HX8K will most likely require a small 
> icache and SPI Flash XIP. This configuration w/ lm32 on TinyFPGA 
> halves execution time compared to without; we can probably go without 
> a dcache. And even if it's not ideal, we can use j1 until j0 is ready :D!
>
> If you want to duplicate my results and see LUT/EBR uages, use my copy 
> of j1 (https://github.com/cr1901/jcore-j1-ghdl/tree/hx8k) and run 
> "make TARGET=ice40hx8k_b_evn". LiteX is using my own copy of j1 for 
> now, just in case I need to make changes and experiment; this is 
> temporary.
>
>> for now another place to look is here: 
> https://github.com/j-core/j-core-ice40/tree/master/testrom 
> <https://github.com/j-core/j-core-ice40/tree/master/testrom> which 
> while less clean, is closer to what you'll need.
>
> Ack. Where is the script/program you use to convert the testrom to a 
> VHDL array? I think I'd rather reuse yours for now than write my own.
>
>> keep in mind that J1 is still a full Harvard machine, so you'll need 
> to mux it down to a single master.
>
> LiteX provides its own mux on the Wishbone bus, so I would adapt both 
> the D and I buses to Wishbone before the mux.
>
>> Yes, there are a few proposals... just microcode, only SH1 
> instructions (no MAC or other DSP functions).  We've also just not had 
> the focus.
>
> Ack. This is a proof-of-concept for now. Once the CPU is in LiteX and 
> working, the hardest part is done, and we can iterate to make the 
> integration better.
>
>> I don't think timing closure, on FPGA multipliers tend to be very 
> fast.  There are a few critical paths, the one that erks me is the T 
> bit feeding into the microcode sequencer.  But when I wrote the MAC 
> unit, it was very clean, even if it's picked up a bit of cruft since.
>
> Ack. One thing I'd like to add: nextpnr has trouble routing the up5k 
> version of the SoC, even with the DSPs and SPRAM relieving about 1k 
> LUTs for other use. nextpnr can take upwards of 5 minutes to route on 
> up5k, and by changing the PCF, I could get nextpnr to take over 10 
> minutes to route before I cancelled it. I'll ask one of the nextpnr 
> devs for some insight.
>
>> IIRC, some are instruction chewers.  J1 is a highly encoded and more 
> complex operation per instruction ISA, and pipelined machine with 
> parallel ALU, MAC and shift units.  The throughput might be 
> comparable, even at a slower clock :)
>
> The time it takes to checksum the main payload in LiteX BIOS may be a 
> good benchmark.
>
> Sincerely,
>

-- 
William D. Jones
wjones at wdj-consulting.com