I know that a few N64 games do use BGEZAL, and they seem to work okay. I will admit that this code is not well tested.
hmh even without link handling?
It does handle the link register (in sjump_assemble) like so:
return_address=start+i*4+8;
emit_movimm(return_address,rt); // PC into link registerThe unfinished part is where the delay slot can not be reordered due to a dependency. Fixing this is probably as simple as copying that code into the in-order execution case, and making sure the register allocation is correct.
These instructions are so uncommon that I didn't know if I'd implemented it correctly because almost nothing uses it. So I put an assert in there and figured I'd debug it when I found a test case. I never found such a case, but it looks like you did.
Did you look at those bugfixes I listed above? I still don't feel I know what I'm doing when dealing with your code, so would be nice to get your confirmation.
- u_int verifier=(int)ptr+((*ptr<<8)>>6)+8; // get target of bl
+ u_int verifier=(int)ptr+((signed int)(*ptr<<8)>>6)+8; // get target of bl
Yes, it should be signed. The relative offset was always positive in my builds, but I'd bet this is one of the things that's making the Android ports crash.
drc: fix unsaved register
it caused invalidate_addr() sometimes to be called with bad address.Hmm.. This won't break anything, but might waste cpu cycles with useless calls to invalidate_addr(). It appears that storelr_assemble has the same glitch.
There shouldn't be any reason to call both STORE*_STUB and INVCODE_STUB, unless there is code at addresses that aren't in normal RAM.
I think the right solution would be to change the return address of the first stub, so that only one or the other is called.
drc: allow xor imm 0
xor with zero does nothing. If you're generating that instruction, then you've probably introduced a bug. This is rather useless also:
if© x=(constmap[i][s]+offset)-(constmap[i][s]+offset);
...
drc: don't clear ARM caches on whole translation cache - it's very slow
Is it really faster to call __clear_cache multiple times in ll_kill_pointers, rather than clearing the entire cache once?
Doing the smaller regions in invalidate_page might be helpful, but I haven't benchmarked it.
drc: fix: storelr should also use AGR
There's clearly a problem there, but I don't think that's all of it. If that code was working, then I'm guessing that the address generation register isn't being allocated properly in pass 5. I'd need to think about this some more, and possibly look at the debugging output to see what the register allocation actually looks like.