Is this true with X11? If you want to upscale the rendering I do render to a framebuffer which is not ideal, but you have to enable this in the configs (its not on by default).
There is no copy to screen part with something rendered with OGL ES 2, or if there is the drivers are not very well done at all. Should be rendered straight to a flipped framebuffer.
Yeah, I have a config option to disable alpha testing. It has a measurable effect but its not huge.
Granted, removing discards just lightens SGX load which usually isn't a problem, but you did mention Banjo Kazooie being render limited.
I'm implementing this test at the moment, every shader change increments the outputted shaders usage count. I know that well written gl drivers have some sort of state cache so that flipping between just a few states regularly doesn't incur much of a performance hit.
I would be curious to see just how many unique shaders are present among the 80 shader changes you've recorded for OoT.