Jump to content


Photo

Mupen64Plus


  • Please log in to reply
884 replies to this topic

#76 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5463 posts
  • Location:Cleveland OH

Posted 09 June 2010 - 02:18 PM

I wonder if this is really necessary. You could try removing it and seeing how many games still work. If most do then this might be an option for speeding up psx4all.


Only loads in the delay slot of indirect branches would have unknown stalling at compile time, and it's a lot less likely that anyone would be explicitly relying on these to pipeline correctly. Catching it means pairing a modify mask with the branch lookup and a source mask with the destination, and I guess using a generated prologue stub to handle it when it comes up.

Analyzing in the code at branch targets could be a problem if psx4all had partial flushing or patching for self modifying code, but I'm pretty sure it just flushes everything. So looking at the branch target should be okay.

#77 Ari64

Ari64

    Magic Emulator Fairy

  • GP32 Hardcore
  • PipPipPipPipPip
  • 477 posts

Posted 09 June 2010 - 02:32 PM

Only loads in the delay slot of indirect branches would have unknown stalling at compile time, and it's a lot less likely that anyone would be explicitly relying on these to pipeline correctly. Catching it means pairing a modify mask with the branch lookup and a source mask with the destination, and I guess using a generated prologue stub to handle it when it comes up.


It doesn't use prologue stubs, it calls the interpreter for that one instruction, which is not the most efficient way to do it.

Is this worth optimizing? Has anyone profiled psx4all to see how much time is actually spent on this?

#78 hlide

hlide

    GP32 Hardcore

  • GP32 Hardcore
  • PipPipPipPip
  • 225 posts

Posted 09 June 2010 - 02:50 PM

It should be 0, 1, 9. Basically what happens is that during the branch it calls psxDelayTest/psxTestLoadDelay to figure out what to do with the next instruction.

I wonder if this is really necessary. You could try removing it and seeing how many games still work. If most do then this might be an option for speeding up psx4all.


Reading SPIM code, their approach seems closer to the real implementation : the next instruction to run after a load delay slot is not necessarily the contiguous instruction in address. If the previous instruction of a load instruction is a branch/jump instruction so the next instruction having the updated register after a load instruction is indeed the targeted instruction.

SPIM handles it in the interpreter by defering the assignment value of a load operation into a register in the next instruction whatever it is : "the result from a load is not available until the subsequent instruction has executed (as in the real machine). We need a two element shift register for the value and its destination, as the instruction following the load can itself be a load instruction."

Edited by hlide, 09 June 2010 - 02:55 PM.


#79 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5463 posts
  • Location:Cleveland OH

Posted 09 June 2010 - 03:28 PM

It doesn't use prologue stubs, it calls the interpreter for that one instruction, which is not the most efficient way to do it.

Is this worth optimizing? Has anyone profiled psx4all to see how much time is actually spent on this?


Calls the interpreter for every indirect branch then? Or every branch, period? How much analysis does it do to check for non-dependencies otherwise, any?

Even if it's just doing it for every indirect branch then that's probably a substantial cost, especially if the interpreter is not optimized for this. If it's only doing it for cases where collision is detected at runtime (should be close to never, if ever) then that won't be a big cost.

#80 hlide

hlide

    GP32 Hardcore

  • GP32 Hardcore
  • PipPipPipPip
  • 225 posts

Posted 09 June 2010 - 03:43 PM

It doesn't use prologue stubs, it calls the interpreter for that one instruction, which is not the most efficient way to do it.

Is this worth optimizing? Has anyone profiled psx4all to see how much time is actually spent on this?


Calls the interpreter for every indirect branch then? Or every branch, period? How much analysis does it do to check for non-dependencies otherwise, any?

Even if it's just doing it for every indirect branch then that's probably a substantial cost, especially if the interpreter is not optimized for this. If it's only doing it for cases where collision is detected at runtime (should be close to never, if ever) then that won't be a big cost.


Just for info, psx4all uses recompiler and there is no such load delay slot handling in the recompiler. I didn't remember to handle it when I worked with Zodttd on GP2x version. And rereading its source, there is no such handling.

Now, should it be worthy while handling this load delay slot to increase game compatibility ?

Edited by hlide, 09 June 2010 - 03:45 PM.


#81 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5463 posts
  • Location:Cleveland OH

Posted 09 June 2010 - 03:49 PM

Just for info, psx4all uses recompiler and there is no such load delay slot handling in the recompiler. I didn't remember to handle it when I worked with Zodttd on GP2x version. And rereading its source, there is no such handling.

Now, should it be worthy while handling this load delay slot to increase game compatibility ?


Seems like the static cases, or at least the simpler ones, should at least be detected and reported so you'll know if something is relying on it, then you can do code to handle it.

#82 Ari64

Ari64

    Magic Emulator Fairy

  • GP32 Hardcore
  • PipPipPipPipPip
  • 477 posts

Posted 09 June 2010 - 10:18 PM

Calls the interpreter for every indirect branch then? Or every branch, period? How much analysis does it do to check for non-dependencies otherwise, any?

Even if it's just doing it for every indirect branch then that's probably a substantial cost, especially if the interpreter is not optimized for this. If it's only doing it for cases where collision is detected at runtime (should be close to never, if ever) then that won't be a big cost.

pcsx-df calls psxDelayTest if there is a load instruction in the delay slot. psxDelayTest calls psxTestLoadDelay which checks if the instruction at the target of the branch reads that register. If so, then it calls execI to execute that one instruction via the interpreter, and then moves the result of the load into the destination register.

psx4all apparently doesn't do this.

#83 Ari64

Ari64

    Magic Emulator Fairy

  • GP32 Hardcore
  • PipPipPipPipPip
  • 477 posts

Posted 12 June 2010 - 02:16 AM

I wanted to know why Conker doesn't work, and found this:

15002ff4: LW r25,r2+4
  15002ff8: SLL r8,r25,8
  15002ffc: BEQL r8,r0,15003050
  15003000: -pagefault-
I hate delay slots.

Not sure if it's worth fixing since we have no microcode support for Conker anyway.

#84 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5463 posts
  • Location:Cleveland OH

Posted 12 June 2010 - 02:25 AM

That's awful, I'm sorry :(

You're probably going to support it eventually, yeah? I wonder what other emulators do.

#85 Ari64

Ari64

    Magic Emulator Fairy

  • GP32 Hardcore
  • PipPipPipPipPip
  • 477 posts

Posted 12 June 2010 - 03:03 AM

That's awful, I'm sorry :(

You're probably going to support it eventually, yeah? I wonder what other emulators do.

Conker BFD seems to page in the executable on demand from a compressed filesystem. It uncompresses one 4K page at a time. (Yes, the rom is 64MB, and they compressed it.)

Original mupen deals with it by executing the delay slot in the interpreter if there is a branch in the last instruction of a 4K page. Due to differences in register caching, I don't quite have the right state set up to do that, at least not using the existing interpreter core.

Not sure what Daedalus does, but reportedly can not run this game.

#86 kikeminchas

kikeminchas

    GP32 User

  • Members
  • PipPipPip
  • 41 posts

Posted 12 June 2010 - 07:09 AM


That's awful, I'm sorry :(

You're probably going to support it eventually, yeah? I wonder what other emulators do.

Conker BFD seems to page in the executable on demand from a compressed filesystem. It uncompresses one 4K page at a time. (Yes, the rom is 64MB, and they compressed it.)

Original mupen deals with it by executing the delay slot in the interpreter if there is a branch in the last instruction of a 4K page. Due to differences in register caching, I don't quite have the right state set up to do that, at least not using the existing interpreter core.

Not sure what Daedalus does, but reportedly can not run this game.


It seems to have been booting once at least in Daedalus PSP. Not sure what is the status now or if just booting is enough for you to consider taking a look at their code. Look at:

http://forums.daedal....com/compat.php

#87 Ari64

Ari64

    Magic Emulator Fairy

  • GP32 Hardcore
  • PipPipPipPipPip
  • 477 posts

Posted 12 June 2010 - 03:24 PM

It seems to have been booting once at least in Daedalus PSP. Not sure what is the status now or if just booting is enough for you to consider taking a look at their code. Look at:

http://forums.daedal....com/compat.php

As I understand it, the daedalus dynarec is basically an interpreter that records instruction traces, and then recompiles the hot paths. I assume that this could work correctly, because when it gets a page fault the first time through, it hasn't recompiled the code yet and is executing instructions one at a time.

DaedalusX64 seems to dump the entire cache if something is invalidated. This would be very inefficient given how often Conker swaps pages.

I'm not sure how you could do it efficiently though, except to reverse-engineer the compression and HLE it.

#88 proflogic

proflogic

    GP32 User

  • Members
  • PipPipPip
  • 37 posts
  • Gender:Male

Posted 12 June 2010 - 04:11 PM

I wanted to know why Conker doesn't work, and found this:

15002ff4: LW r25,r2+4
  15002ff8: SLL r8,r25,8
  15002ffc: BEQL r8,r0,15003050
  15003000: -pagefault-
I hate delay slots.

Not sure if it's worth fixing since we have no microcode support for Conker anyway.


Well, it could be worse. At least you aren't emulating SPARC. :lol: (Though, I'm not familiar enough with MIPS to know it doesn't have conditionally-executed delay slots...)

#89 silver

silver

    GP32 Hardcore

  • Members
  • PipPipPipPip
  • 149 posts

Posted 12 June 2010 - 04:40 PM

Not sure if it's worth fixing since we have no microcode support for Conker anyway.


Has any N64 emu ever worked out Rare's extra microcode for this? I thought all 64 games had been emulated one way or another....

....except to reverse-engineer the compression and HLE it.

...almost sounds like you are setting yourself a challenge... :ph34r:


Is/Was a fun a rather unusual N64 game - (and one of the few that needed a 64Meg backup unit, so it's one of the handful of carts I actually own.) Is the whole rom compressed with everything decompressed on the fly or did they use specific acoding for audio/video/gfx etc...?

#90 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5463 posts
  • Location:Cleveland OH

Posted 12 June 2010 - 05:00 PM

Well, it could be worse. At least you aren't emulating SPARC. :lol: (Though, I'm not familiar enough with MIPS to know it doesn't have conditionally-executed delay slots...)


That's exactly what the beql is.

Emulating something like say, TMS320C6x would be a lot harder.