[J-core] Adding J1 to the roadmap.

Wed May 18 11:29:39 EDT 2016

On 16-05-18 08:07 AM, D. Jeff Dionne wrote:
> Which reminds me, we should try switching the decoder from random logic
> (ASIC style) to FPGA BRAM style and see what happens to the size again
> for the logic constrained platforms, unless Geoff already has done.

I just tried building mimas_v2 with the 2 different decode_table 
implementations.

Here's the output in .mrp using the reverse_logic architecture of 
decode_table:

Slice Logic Utilization:
   Number of Slice Registers:                 3,331 out of  11,440   29%
     Number used as Flip Flops:               3,327
     Number used as Latches:                      3
     Number used as Latch-thrus:                  0
     Number used as AND/OR logics:                1
   Number of Slice LUTs:                      5,307 out of   5,720   92%
     Number used as logic:                    5,190 out of   5,720   90%
       Number using O6 output only:           4,519
       Number using O5 output only:             162
       Number using O5 and O6:                  509
       Number used as ROM:                        0
     Number used as Memory:                      60 out of   1,440    4%
       Number used as Dual Port RAM:             60
         Number using O6 output only:             8
         Number using O5 output only:             0
         Number using O5 and O6:                 52
       Number used as Single Port RAM:            0
       Number used as Shift Register:             0
     Number used exclusively as route-thrus:     57
       Number with same-slice register load:     52
       Number with same-slice carry load:         5
       Number with other load:                    0

Slice Logic Distribution:
   Number of occupied Slices:                 1,427 out of   1,430   99%
   Number of MUXCYs used:                       496 out of   2,860   17%
   Number of LUT Flip Flop pairs used:        5,536
     Number with an unused Flip Flop:         2,343 out of   5,536   42%
     Number with an unused LUT:                 229 out of   5,536    4%
     Number of fully used LUT-FF pairs:       2,964 out of   5,536   53%
     Number of slice register sites lost
       to control set restrictions:               0 out of  11,440    0%

And here's the output in .mrp using the rom architecture of decode_table:

Slice Logic Utilization:
   Number of Slice Registers:                 3,323 out of  11,440   29%
     Number used as Flip Flops:               3,319
     Number used as Latches:                      3
     Number used as Latch-thrus:                  0
     Number used as AND/OR logics:                1
   Number of Slice LUTs:                      4,471 out of   5,720   78%
     Number used as logic:                    4,387 out of   5,720   76%
       Number using O6 output only:           3,684
       Number using O5 output only:             162
       Number using O5 and O6:                  541
       Number used as ROM:                        0
     Number used as Memory:                      60 out of   1,440    4%
       Number used as Dual Port RAM:             60
         Number using O6 output only:             8
         Number using O5 output only:             0
         Number using O5 and O6:                 52
       Number used as Single Port RAM:            0
       Number used as Shift Register:             0
     Number used exclusively as route-thrus:     24
       Number with same-slice register load:     19
       Number with same-slice carry load:         5
       Number with other load:                    0

Slice Logic Distribution:
   Number of occupied Slices:                 1,400 out of   1,430   97%
   Number of MUXCYs used:                       496 out of   2,860   17%
   Number of LUT Flip Flop pairs used:        4,968
     Number with an unused Flip Flop:         1,785 out of   4,968   35%
     Number with an unused LUT:                 497 out of   4,968   10%
     Number of fully used LUT-FF pairs:       2,686 out of   4,968   54%
     Number of slice register sites lost
       to control set restrictions:               0 out of  11,440    0%

Only a small reduction in occupied slices but a 15% reduction in Slice 
LUTs used. The timing score goes from 20840 to 21516.

> On the other hand, I think the selective stripping or stripping down of
> pipeline appendages... barrel shifter, MAC unit, etc is a useful
> exercise in making the implementation as clean as possible.  The
> compiler support could be tweaked, but some instructions removed might
> require support in libgcc.a

Removing groups of instructions and the associated hardware will be 
interesting. Will need to revisit how cpu_gen works. Will J1 have SH-2's 
rotate and shift instructions (ROTL/ROTR, ROTCL/ROTCR, SHAL/SHAR, 
SHLL*/SHLR*)?

- Geoff