[J-core] 2D unit

Sun Feb 26 20:29:19 EST 2017

Hello,

We had Thursday a nice meetup after ELC and we came up with some
ideas. Not to sure if they are good as the food and drink may have
helped :-)

One of the idea we discussed quite in depth was how to design a nice
2D unit that would go with the turtle. The main issue as discussed
before on this mailing list is that most toolkit won't care about
anything, but OpenGL and Vulkan this days. Obviously this standard are
way to big and complex for us at this stage.

There is a limited alternative which is the KMS/DRM API which provide
a way to provide a list of buffer to display at a specific position
per frame usually named hardware plane. This is used by a certain
numbers of Wayland server to accelerate the movement of mouse pointer
and allow the active window to be a zero compositing case (Not just
when watching a movie, just about any application).

To implement such functionnality the usual way in all hardware I know,
is to have a completely dedicated black box. More often than not, this
block are actually running some kind of firmware. This gave us an
idea, what if we use a J1 with a small amount of dedicated SRAM (2 *
8KB ?) accessible from the main cpu, a dedicated access to the DMA
engine, an interrupt line to the main CPU, control over the HDMI
output and maybe a few special instruction to do blending operation.

The kernel driver could actually contain the exact source code of the
firmware run on that J1. The driver would use some linker trick to
select the function and the data that need to be copied from its own
code (As J1 and J2 have compatible instruction set, same compiler can
be used). The boot loader could do likely the same trick, with a very
simple implementation that handle just a terminal at 640x480.

I do like this idea a lot as it open the possibility for a lot of
hacking here. You could maybe manage to generate a JIT version per
frame instead of relying on function and manage a larger number of
"hardware" plane. Implementing a mini scenegraph would enable the
possibility to correctly detect the case when not to composite to
surface and reduce the bandwidth need. That is for the most casual
idea, opening firmware development is I am sure going to lead to
interesting idea.

We can discuss the benefit of having a specific instruction to do
blending or enable the full sized multiply, but I think that something
we can experiment later with the turtle and see what work best. So
let's just focus on is this a good idea for now and maybe can we apply
the same concept to other unit (network, audio, tpm ?) ? Is it not
going to consume to much space on the fpga ? Do we really need all
that flexibility ?

So what do you think of this idea ?
-- 
Cedric BAIL