[J-core] correction (Linux is now working) (Re: Achievable clock speed, bottlenecks?)

Wed Oct 5 14:28:20 EDT 2016

On 10/5/2016 6:52 AM, Rob Landley wrote:
> On 10/01/2016 01:53 PM, BGB wrote:
>>> but, trying to type stuff gave garbage characters, and I have been
>>> unable to reproduce this result in subsequent runs (ex: actually
>>> getting to a prompt).
> ...
>> garbage appears to be ANSI codes.
> The ansi codes are a terminal size probe. (It saves the current
> location, jumps to the bottom right corner, asks the terminal to send
> back the current cursor position, and jumps back to the saved location.)
>
> This is because asking Linux what it thinks the terminal size is
> (through the tty ioctl) won't help if you're on a serial console where
> the window is on the other side of a serial device, so it doesn't know
> how big a window it is. When the ioctl probe fails (says "we have a
> terminal but it's zero by zero"), you have to do in-band signalling with
> an escape sequence the terminal responds to. (Basically all modern
> xterms respond, I'm guessing yours doesn't because it's windows?)
>
> The emulated system thinks it has a serial console, and a decade ago I
> taught the busybox shell to do the ANSI probe on serial consoles.
ok.

>> there appears there is some difference between what I am getting from
>> the Windows command-prompt vs WS4L console which effects what happens in
>> the emulated Linux.
> I.E. above. :)
>
> It sounds like the ansi probe is _confusing_ the windows terminal.

more like, the Windows terminal has no idea what to do with ANSI codes, 
so just prints the raw escape sequences.

other things don't match up, for example, for most of the non-character 
keys, the WS4L console gives ANSI codes (or various other stuff), 
whereas the Windows terminal gives keyboard scancodes.

possible ways to address this:
   use Win32 API calls to implement behavior of ANSI codes, and also 
generate ANSI codes for keys;
   create a window, draw my own console, and implement ANSI commands.

former strategy looks like kind of a pain, and I have noticed issues 
here in WS4L (in general):
     curses UI's tend to be a mangled mess;
     lots of printed messages contain a lot of garbage characters;
     ...
this implies it is possibly a non-trivial level of effort.

later strategy involves creating a window and drawing characters, but 
this isn't particularly difficult.

this strategy could possibly also be able to function as a framebuffer 
display, but then I would probably need to deal with variable screen 
resolution in some way (unless I can get Linux to use a fixed-resolution 
framebuffer).

most likely solution could be there could be a certain "ideal" 
resolution, and any other resolutions would be created via scaling the 
image (at the cost of image quality).
would need to get strategic about resolution, as upsampling gives best 
results with factors of 1.5x or 2x.
ex: 960x720 could also handle 640x480 (1.5x), 480x360 (2x), 320x240 
(3x), ...

it is also possible to dynamically change window resolution, but this 
has its own issues.

TBD: may put on TODO list.

had partially started on emaclite, but this is slowed:
     current Linux builds don't use it, ...;
     to do anything useful with it, I would effectively also need to 
implement a "network stack".

started initial work for writing a custom SH assembler:
     TBD is whether to make COFF or ELF the default object format;
         Linux prefers ELF, but for my own uses I have generally 
preferred COFF.
     similar goes for output formats (PE/COFF vs ELF vs custom).

possible output formats:
     PE/COFF, which is straightforward and I am pretty familiar with;
     ELF, which is friendly to Linux, but gets more complicated in its 
handling of dynamic linking (vs PE);
     possible custom: imagining a full-sequential format (no 
seeks/buffers) with LZ compressed payload.
     flat image: probably relevant, basically a raw ROM-style image.

it probably wont be used for targeting Linux (may leave this more to 
GCC/GAS), but could be used for compiling programs for "testkern" (if 
expanded out into more of an OS). if so, I am personally more inclined 
towards using either PE/COFF or a custom format for binaries.

the custom format would sort of resemble a hybrid of PE/COFF and a 
simplified OGG, intended to be decoded as a stream of fixed-size blocks 
(currently 512 bytes, may change) which unpack the image into the target 
memory and optionally perform any fixups. the compression scheme would 
be either LZ4 or "something similar" (for a relatively small/simple 
decoder).

some quick tests showed only about an 11% size difference between LZ4 
and Deflate for SH machine-code, with an approx 40% size reduction for 
the tested code, so it seems fairly plausible.

idea here was to allow loading without any need for temporary buffers 
(beyond the current block) or the ability to seek within a file, which 
can add to the cost of loading (say, the program loader either doesn't 
need, or can bypass, the use of a buffered file-IO abstraction layer).

this doesn't fully address the case of dynamically-linked binaries 
though, which may require either recursion, or the ability to delay this 
part until after pulling in the other images. I haven't thought of a 
great solution for this part yet, beyond a "hey, loader, pull in these 
libraries and then come back here" block, with it being undefined if the 
loader either recurses, or copies the remaining contents into a 
buffer/queue/stack for later processing.

note that testkern would probably also have support for PE/COFF and for 
statically-linked ELF binaries. specifics TBD (shared-objects/ld.so 
funkiness could probably left for "some other time").

though, whether or not I do a C compiler is more far-reaching, a 
compromise being to do a macro-assembler (but, macro facilities aren't 
particularly useful if used in a compiler backend, and are usually 
better off disabled in this use-case anyways).

but, yeah, still working on stuff...