[J-core] PC-relative loads and delay slots
Rich Felker
dalias at libc.org
Mon Jul 18 17:39:26 EDT 2016
On Mon, Jul 18, 2016 at 01:28:14PM -0700, Robert Ou wrote:
> On Mon, Jul 18, 2016 at 6:52 AM, Rich Felker <dalias at libc.org> wrote:
> > On Mon, Jul 18, 2016 at 02:37:55AM -0700, Robert Ou wrote:
> >> Hi,
> >>
> >> What is the correct behavior of PC-relative instructions such as
> >> "mov.l @(disp, PC), Rn" in a branch delay slot? Is this even allowed?
> >> From my testing, GAS seems to think it is "disp is multiplied by 4 and
> >> added to the address of the mov.l opcode + 2" but J-core seems to
> >> execute it as "disp is multiplied by 4 and added to the address of the
> >> branch target + 2". I discovered this while working on my MyHDL
> >> demonstration, and you can compare the difference in my demonstration
> >> by running the master branch and the branch_delay_test branch.
> >
> > If true I think it's a bug. The original SH ISA documentation
> > specifies the behavior for "PC-relative" mov instructions as:
> >
> > "The PC points to the starting address of the second instruction
> > after this MOV instruction"
> >
> > (as opposed to the actual current value of the program counter). This
> > text is found on page 202 of document REJ09B0171-0500O.
> >
> > I'm quite surprised we haven't run into this bug, since I would expect
> > gcc to generate code with immediate loads in branch delay slots (e.g.
> > when making function calls with constant arguments).
>
> I did some testing on an actual Mimas v2 board using the mimas_v2.bin
> on the j-core website (bootrom says Mon Apr 18 22:36:49 UTC 2016, md5
> is 390768a7ef9061a19f163061c4e15ac3), and PC-relative loads in the
> delay slot of "bra label" seem to work correctly but PC-relative loads
> in the delay slot of "bsr label" or "rts" do not seem to work
> correctly. I didn't test any other instructions with delay slots. I
> tested it with this program (replace
> sources/root-filesystem/src/hello.c in the Aboriginal Linux source):
>
> #include <stdio.h>
>
> asm (
> "test_delay_slot_1: \n"
> " sts pr, r1 \n"
>
> " bsr dummysub_1 \n"
> " mov.l testval_1, r0 \n"
>
> " lds r1, pr \n"
> " rts \n"
> " nop \n"
> "dummysub_1: \n"
> " rts \n"
> " nop \n"
> ".align 4 \n"
> "testval_1: \n"
> " .long 0xabcddcba \n"
> );
>
> asm (
> "test_delay_slot_2: \n"
> " sts pr, r1 \n"
>
> " mov.l testval_2, r0 \n"
> " bsr dummysub_2 \n"
> " nop \n"
>
> " lds r1, pr \n"
> " rts \n"
> " nop \n"
> "dummysub_2: \n"
> " rts \n"
> " nop \n"
> ".align 4 \n"
> "testval_2: \n"
> " .long 0xabcddcba \n"
> );
>
> asm (
> "test_delay_slot_3: \n"
>
> " bra testret_3 \n"
> " mov.l testval_3, r0 \n"
>
> "testret_3: \n"
> " rts \n"
> " nop \n"
> ".align 4 \n"
> "testval_3: \n"
> " .long 0xfeedface \n"
> );
>
> asm (
> "test_delay_slot_4: \n"
>
> " mov.l testval_4, r0 \n"
> " bra testret_4 \n"
> " nop \n"
>
> "testret_4: \n"
> " rts \n"
> " nop \n"
> ".align 4 \n"
> "testval_4: \n"
> " .long 0xfeedface \n"
> );
>
> asm (
> "test_delay_slot_5: \n"
>
> " rts \n"
> " mov.l testval_5, r0 \n"
>
> ".align 4 \n"
> "testval_5: \n"
> " .long 0xcafed00d \n"
> );
>
> asm (
> "test_delay_slot_6: \n"
>
> " mov.l testval_6, r0 \n"
> " rts \n"
> " nop \n"
>
> ".align 4 \n"
> "testval_6: \n"
> " .long 0xcafed00d \n"
> );
>
> int main(int argc, char *argv[])
> {
> printf("Hello world!\n");
> printf("Result 1 is %08X\n", test_delay_slot_1());
> printf("Result 2 is %08X\n", test_delay_slot_2());
> printf("Result 3 is %08X\n", test_delay_slot_3());
> printf("Result 4 is %08X\n", test_delay_slot_4());
> printf("Result 5 is %08X\n", test_delay_slot_5());
> printf("Result 6 is %08X\n", test_delay_slot_6());
> return 0;
> }
>
> The output of this program is:
> Hello world!
> Result 1 is 012AD006
> Result 2 is ABCDDCBA
> Result 3 is FEEDFACE
> Result 4 is FEEDFACE
> Result 5 is 0009D412
> Result 6 is CAFED00D
>
> but I believe the expected output should be:
> Hello world!
> Result 1 is ABCDDCBA
> Result 2 is ABCDDCBA
> Result 3 is FEEDFACE
> Result 4 is FEEDFACE
> Result 5 is CAFED00D
> Result 6 is CAFED00D
I can confirm this. In addition, running under qemu-sh4eb gives the
output you (and I) expect. Running on the actual board gives wrong
output (but different from yours, probably due to different compiler):
~ # ./a.out
Hello world!
Result 1 is 012AD006
Result 2 is ABCDDCBA
Result 3 is FEEDFACE
Result 4 is FEEDFACE
Result 5 is 65236413
Result 6 is CAFED00D
Rich
More information about the J-core
mailing list