This time, I am including both the source as well as binaries. It's not a full distribution yet, hence I am posting it on my personal web page instead of Sourceforge (even though all the code is checked in, too).
The little demo program is in bin/gp2x/release. Copy test.gpe and the image file dodge.raw out of the test folder onto your SD card in order to try the example.
The new rasterizer improved things, but not as much as I had hoped. The demo car (732 triangles) spins at 17-20 fps, depending on the view. (mipmap nearest, nearest texel, depth buffer, culling). For performance freaks, you can find specialized versions of the inner loops in RasterizerPieces.cpp that you can hook up in RasterizerTriangles.cpp (look for the comments, it's obvious). While usually these inner loops are compiled at runtime using the built-in JIT, you might want to experiment with these versions to see how far you can stretch performance. In the form supplied, the difference makes up 1 fps (e.g. 18-21 fps).
I haven't tested everything yet, so it's quite likely that you will run into glitches here and there. If you do, let me know.
So what's left to do besides bug fixing?
- Rework the transformation path from eye coordinates to clip coordinates and clipping for better precision. This distribution has for speed reasons the code from build 0.84, which won't pass the conformance test.
- Adapt the surface class to SDL
- Implement the scissor test in the triangle rasterizer.
- Investigate addition of specific hand-optimized inner loops into the rasterizer
- Optimize the coordinate selection and geometry/lighting code
- Lazy lighting: move lighting calculations after culling
- Improve global register allocation in JIT
- Start experimenting with dual-cpu support