[J-core] PC-relative loads and delay slots

Tue Jul 19 00:22:58 EDT 2016

I don’t see a better way than the approach you’ve taken, and so is just
one more piece of evidence that these instructions expose pipeline internals
too much.  Therefore they ‘should’ trap in the delay slot if we want consistent
behaviour  That said, surely SH1/2/3 code somewhere relies upon it as is…

J.

> On Jul 19, 2016, at 12:55 PM, Robert Ou <rqou at robertou.com> wrote:
> 
> On Mon, Jul 18, 2016 at 8:23 PM, Rich Felker <dalias at libc.org> wrote:
>> On Mon, Jul 18, 2016 at 09:52:16AM -0400, Rich Felker wrote:
>>> On Mon, Jul 18, 2016 at 02:37:55AM -0700, Robert Ou wrote:
>>>> Hi,
>>>> 
>>>> What is the correct behavior of PC-relative instructions such as
>>>> "mov.l @(disp, PC), Rn" in a branch delay slot? Is this even allowed?
>>>> From my testing, GAS seems to think it is "disp is multiplied by 4 and
>>>> added to the address of the mov.l opcode + 2" but J-core seems to
>>>> execute it as "disp is multiplied by 4 and added to the address of the
>>>> branch target + 2". I discovered this while working on my MyHDL
>>>> demonstration, and you can compare the difference in my demonstration
>>>> by running the master branch and the branch_delay_test branch.
>>> 
>>> If true I think it's a bug. The original SH ISA documentation
>>> specifies the behavior for "PC-relative" mov instructions as:
>>> 
>>>  "The PC points to the starting address of the second instruction
>>>  after this MOV instruction"
>>> 
>>> (as opposed to the actual current value of the program counter). This
>>> text is found on page 202 of document REJ09B0171-0500O.
>>> 
>>> I'm quite surprised we haven't run into this bug, since I would expect
>>> gcc to generate code with immediate loads in branch delay slots (e.g.
>>> when making function calls with constant arguments).
>> 
>> Some further info:
>> 
>> mova is documented to produce a result relative to the branch
>> destination, but pc-relative mov.l seems to be documented to behave as
>> I described above. However, this is only valid for sh1/2/3. On sh4,
>> both mova and pc-relative mov.l (and mov.w) are illegal in delay slots
>> and result in a trap (so the kernel can emulate them very slowly if
>> you really want them). This is presumably why gcc never generates
>> the pc-relative mov.l in delay slots and thus why the bug has never
>> affected us.
>> 
>> Rich
> 
> In the meantime, I have written a (very ugly) patch that makes both
> mova and mov.l behave "as you would intuitively expect." It is
> attached. It didn't break in some quick smoke testing (booting the
> kernel, running the test program posted earlier), but I haven't tested
> it extensively. It is also terrible code-quality-wise and violates a
> bunch of abstraction barriers. Since it turns out that the "weird"
> behavior is the correct behavior, this patch is probably only useful
> for reference if someone wants to make a not-quite-sh2-compatible
> core.
> <possible-delay-slot-fix.patch>_______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core