[J-core] PC-relative loads and delay slots

Robert Ou rqou at robertou.com
Mon Jul 18 19:04:59 EDT 2016


On Mon, Jul 18, 2016 at 2:39 PM, Rich Felker <dalias at libc.org> wrote:
> On Mon, Jul 18, 2016 at 01:28:14PM -0700, Robert Ou wrote:
>> On Mon, Jul 18, 2016 at 6:52 AM, Rich Felker <dalias at libc.org> wrote:
>> > On Mon, Jul 18, 2016 at 02:37:55AM -0700, Robert Ou wrote:
>> >> Hi,
>> >>
>> >> What is the correct behavior of PC-relative instructions such as
>> >> "mov.l @(disp, PC), Rn" in a branch delay slot? Is this even allowed?
>> >> From my testing, GAS seems to think it is "disp is multiplied by 4 and
>> >> added to the address of the mov.l opcode + 2" but J-core seems to
>> >> execute it as "disp is multiplied by 4 and added to the address of the
>> >> branch target + 2". I discovered this while working on my MyHDL
>> >> demonstration, and you can compare the difference in my demonstration
>> >> by running the master branch and the branch_delay_test branch.
>> >
>> > If true I think it's a bug. The original SH ISA documentation
>> > specifies the behavior for "PC-relative" mov instructions as:
>> >
>> >   "The PC points to the starting address of the second instruction
>> >   after this MOV instruction"
>> >
>> > (as opposed to the actual current value of the program counter). This
>> > text is found on page 202 of document REJ09B0171-0500O.
>> >
>> > I'm quite surprised we haven't run into this bug, since I would expect
>> > gcc to generate code with immediate loads in branch delay slots (e.g.
>> > when making function calls with constant arguments).
>>
>> I did some testing on an actual Mimas v2 board using the mimas_v2.bin
>> on the j-core website (bootrom says Mon Apr 18 22:36:49 UTC 2016, md5
>> is 390768a7ef9061a19f163061c4e15ac3), and PC-relative loads in the
>> delay slot of "bra label" seem to work correctly but PC-relative loads
>> in the delay slot of "bsr label" or "rts" do not seem to work
>> correctly. I didn't test any other instructions with delay slots. I
>> tested it with this program (replace
>> sources/root-filesystem/src/hello.c in the Aboriginal Linux source):
>>
>> #include <stdio.h>
>>
>> asm (
>> "test_delay_slot_1:         \n"
>> "   sts pr, r1              \n"
>>
>> "   bsr dummysub_1          \n"
>> "    mov.l testval_1, r0    \n"
>>
>> "   lds r1, pr              \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> "dummysub_1:                \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> ".align 4                   \n"
>> "testval_1:                 \n"
>> "   .long 0xabcddcba        \n"
>> );
>>
>> asm (
>> "test_delay_slot_2:         \n"
>> "   sts pr, r1              \n"
>>
>> "   mov.l testval_2, r0     \n"
>> "   bsr dummysub_2          \n"
>> "    nop                    \n"
>>
>> "   lds r1, pr              \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> "dummysub_2:                \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> ".align 4                   \n"
>> "testval_2:                 \n"
>> "   .long 0xabcddcba        \n"
>> );
>>
>> asm (
>> "test_delay_slot_3:         \n"
>>
>> "   bra testret_3           \n"
>> "    mov.l testval_3, r0    \n"
>>
>> "testret_3:                 \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> ".align 4                   \n"
>> "testval_3:                 \n"
>> "   .long 0xfeedface        \n"
>> );
>>
>> asm (
>> "test_delay_slot_4:         \n"
>>
>> "   mov.l testval_4, r0     \n"
>> "   bra testret_4           \n"
>> "    nop                    \n"
>>
>> "testret_4:                 \n"
>> "   rts                     \n"
>> "    nop                    \n"
>> ".align 4                   \n"
>> "testval_4:                 \n"
>> "   .long 0xfeedface        \n"
>> );
>>
>> asm (
>> "test_delay_slot_5:         \n"
>>
>> "   rts                     \n"
>> "    mov.l testval_5, r0    \n"
>>
>> ".align 4                   \n"
>> "testval_5:                 \n"
>> "   .long 0xcafed00d        \n"
>> );
>>
>> asm (
>> "test_delay_slot_6:         \n"
>>
>> "   mov.l testval_6, r0     \n"
>> "   rts                     \n"
>> "    nop                    \n"
>>
>> ".align 4                   \n"
>> "testval_6:                 \n"
>> "   .long 0xcafed00d        \n"
>> );
>>
>> int main(int argc, char *argv[])
>> {
>>   printf("Hello world!\n");
>>   printf("Result 1 is %08X\n", test_delay_slot_1());
>>   printf("Result 2 is %08X\n", test_delay_slot_2());
>>   printf("Result 3 is %08X\n", test_delay_slot_3());
>>   printf("Result 4 is %08X\n", test_delay_slot_4());
>>   printf("Result 5 is %08X\n", test_delay_slot_5());
>>   printf("Result 6 is %08X\n", test_delay_slot_6());
>>   return 0;
>> }
>>
>> The output of this program is:
>> Hello world!
>> Result 1 is 012AD006
>> Result 2 is ABCDDCBA
>> Result 3 is FEEDFACE
>> Result 4 is FEEDFACE
>> Result 5 is 0009D412
>> Result 6 is CAFED00D
>>
>> but I believe the expected output should be:
>> Hello world!
>> Result 1 is ABCDDCBA
>> Result 2 is ABCDDCBA
>> Result 3 is FEEDFACE
>> Result 4 is FEEDFACE
>> Result 5 is CAFED00D
>> Result 6 is CAFED00D
>
> I can confirm this. In addition, running under qemu-sh4eb gives the
> output you (and I) expect. Running on the actual board gives wrong
> output (but different from yours, probably due to different compiler):
>
> ~ # ./a.out
> Hello world!
> Result 1 is 012AD006
> Result 2 is ABCDDCBA
> Result 3 is FEEDFACE
> Result 4 is FEEDFACE
> Result 5 is 65236413
> Result 6 is CAFED00D
>
> Rich

While investigating why "bra label" would work but "bsr label"
wouldn't work, I started to realize that "bra label" also shouldn't
work. It works in the program I posted earlier because the branch
didn't actually skip any instructions. If you take that program and
change test_delay_slot_3 and test_delay_slot_4 to the following, it
will break as well:

asm (
"test_delay_slot_3:         \n"

"   bra testret_3           \n"
"    mov.l testval_3, r0    \n"
// Filler space
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"

"testret_3:                 \n"
"   rts                     \n"
"    nop                    \n"
// Filler space
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
".align 4                   \n"
"testval_3:                 \n"
"   .long 0xfeedface        \n"
);

asm (
"test_delay_slot_4:         \n"

"   mov.l testval_4, r0     \n"
"   bra testret_4           \n"
"    nop                    \n"
// Filler space
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"

"testret_4:                 \n"
"   rts                     \n"
"    nop                    \n"
// Filler space
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
"   nop                     \n"
".align 4                   \n"
"testval_4:                 \n"
"   .long 0xfeedface        \n"
);

In my case, it now produces
Hello world!
Result 1 is 012AD006
Result 2 is ABCDDCBA
Result 3 is 00090009
Result 4 is FEEDFACE
Result 5 is 0009D412
Result 6 is CAFED00D


More information about the J-core mailing list