[J-core] How crazy would a nommu debian port be?
dalias at libc.org
Tue Aug 23 23:16:25 EDT 2016
On Tue, Aug 23, 2016 at 03:25:29PM -0700, Cedric BAIL wrote:
> On Tue, Aug 23, 2016 at 1:39 PM, Rich Felker <dalias at libc.org> wrote:
> > On Tue, Aug 23, 2016 at 11:50:58AM -0700, Cedric BAIL wrote:
> >> >> > The issue is drawing into the buffer in the first place. Rich did
> >> >> > some benchmarking on memory copy, and this is a similar problem
> >> >> > set. With come clever programming or hardware, one could get DMA
> >> >> > assist on block copy.
> >> >>
> >> >> Yes, I am expecting that. memcpy and memset are usually the limit on
> >> >> every hardware when doing graphics anyway. That's why the most
> >> >> important technic is to not draw anything if possible :-) Partial
> >> >> update and hardware plane are usually the best helper there. If you
> >> >> look at your screen how many pixels do really change at once. That's
> >> >> the only thing your CPU really need to work on. With clever UI design
> >> >> and proper cut off on the useless drawing, a J2 should provide enough
> >> >> possibility, but you are indeed pointing to an interesting point.
> >> >> Would it be possible to speedup memcpy and memset with some DMA assist
> >> >> ? It is usually not possible as the cost of going into the kernel and
> >> >> setting up MMU destroy all possible gain, but maybe on a J2 it makes
> >> >> sense.
> >> >
> >> > Yes, but for it to be architecturally reasonable (and future-proof to
> >> > J3/J4 with mmu) for userspace Linux binaries to use dma memcpy, we'd
> >> > need to add a dma_memcpy syscall. There's precedent for this on
> >> > blackfin so I think it may be reasonable. But it would only make sense
> >> > for large, well-aligned copies.
> >> As memcpy is part of the libc, doesn't that relax a bit the constraint
> >> ?
> > Yes and no. To do it in userspace, the libc memcpy would have to be
> > aware of the specific dma controller available, where it's registers
> > are mapped, which channel it's permitted to use, how to negotiate
> > access to that dma channel and to the dma controller registers with
> > other processes, etc. It could be done if the kernel somehow provided
> > the information to the process in an appropriate form, but this is all
> > way outside the scope of what belongs in userspace, unless perhaps the
> > kernel simply provided it as a black box memcpy function in the vdso.
> I see your point. It seems hard to be doable in a vdso, no ? Well, at
> least in case where you have a mmu and you proper separation between
> user space and kernel. I am not a kernel developer nor a libc
> developer, so I might be wrong.
On nommu, the code the kernel provides in the vdso could theoretically
program the dmac from userspace to eliminate syscall overhead. With
mmu, it would need to make the dma_memcpy syscall. The advantage of
having vdso code do it rather than libc code would just be that the
kernel could provide hardware-specific logic for the cutoffs at which
dma is an advantage.
More information about the J-core