QUOTE(Squidge @ Mar 3 2006, 01:21 AM)
Filtering the screen would be slow unless the emulator run well in advance of full speed.
I have to disagree here if we are talking about such things like supersampling/downsampling. At 4x ratio it should not be slow at any means. Heck there is already one game what it's doing it. Even my demo move over 100MB/s of data, process it and have some cycles left.
QUOTE(Squidge @ Mar 3 2006, 01:21 AM)
Using the 940 for the task generally isn't practical, as the screen is 150KB, and the 940 only has a 4KB cache, so it'll spend most of it's time fighting the 920 for control of the data bus to main memory. It'll be quicker to use the 920 for the job in most cases.
Let see... 150KB of data to process and 4KB of cache. Load/store bandwitch can be around 100MB/s when using ldm/stm. So 37.5 bulks to load and store * 30fps = 9MB/s and 6.666 mips to process one frame. That's over 80 machines cycles per every pixel. It looks doable for me.
Of course all that loading and processing data in bulks (scanlines, tiles, whatever only not random jumping using str/ldr). The manual locking the caches will help too.
It can be done like so (I was using scanline rendering aproach in my demo) and there some commercial examples of it as well. The PowerVR based 3d accelerators for an example. I have one (thought I'm not using it now) and it works very well thought might be tricky to implement.