It you're referring to the cmp/bvc, it should look familiar, Daedalus does something very similar.
Your code generation looks great, I think that people won't have much to worry about getting good speed N64 emulation out of this. I really like how simple the memory map emulation is, not that it's news to me regarding N64 but just seeing it is nice, especially since you don't need a register for it on ARM. It's at least one nice break you have emulating N64 over other things.. Just a few questions:
There is a preferred mapping, which is r1->r1, r2->r2...r7->r7, r8->r0, r9->r1, etc. This helps with branches, as things are usually in the same registers, however if it runs out of registers, then it will use whatever registers are available.
- Any plans for any global register allocation strategies? Of course this would help with the 64bit stuff too, I think. Not that pretty much anything would be at all simple to implement..
I hadn't seriously considered it, but it would be possible. The page size is 4K because that's the native page size on the r4300, although almost no N64 games actually use the MMU. (And if any do use the MMU, they probably won't work, since I haven't tested this.)
- Are you considering using MMU protection (mmap) for the self modifying code check on the store? Since you're using the same page granularity anyway, which kind of suggests to me that you plan for it later.
To schedule instructions this way would require an instruction-reordering pass after code generation. This could be done but would take some work. I guess the Cortex-A9 CPU will do this in hardware. It might be almost as effective to simply change the register allocation so that registers are allocated/loaded one instruction before they are needed.
- Any plans on scheduling for Cortex-A8? Naturally this will make the register allocation more constricted but with a lot of loads in the picture it seems worthwhile.
I can't see any advantage to this. r29 is generally used as a stack, so the example code is in fact using the stack.
- Option for shadow stack pointer? Of course, since the example code is not even using the stack I can't tell if you aren't already..
Something could probably be done to improve the floating point performance. Currently it calls libc/libm, which I assume is softvfp. However, floating point operations are typically less than 5% of the instructions in most N64 games, so it's not a showstopper.
And of cousre any other optimization plans you have, or tricks you're currently doing, I'd love to hear.
I don't think I have the correct binary blob to test hardware acceleration, so maybe someone more familiar with this can comment.
I can't imagine this current OpenGL ES problem is going to be a huge barrier. Pandora could very well launch with good N64 emulation.
Also some of the textures are clearly not right. I don't know if this is a bug in mesa or rice video, but it doesn't happen on x86.