Anyone For Dosbox? I sacrificed a virginal goat for Pickle...
#1
Posted 06 August 2009 - 07:27 PM
I cannot guarantee 100% that the goat was a virgin, but ffs I hope it was. You see, there wasn't a male goat...
Thanks in advance for any response.
#2
Posted 06 August 2009 - 08:23 PM
sold, on 06 August 2009 - 03:27 PM, said:
Not really anything new with dosbox, last i was running CVS (0.73) code with some custom M-HT dynarec changes. I was also last playing with dark forces as a test.
I will try and get some videos going for the weekend.
Now before the requests start to come in, im only going to try things I have and arnt a total pain to load.
#4
Posted 06 August 2009 - 08:44 PM
Khan, on 06 August 2009 - 04:35 PM, said:
Thanks :)
I can run at 850 Mhz.
I will be running them with dynamic core, MAX cycles, 22050 sample rates
Update: Dosbox videos coming today :-)
This post has been edited by Pickle: 08 August 2009 - 06:26 PM
#5
Posted 08 August 2009 - 10:28 PM
Pickle, on 06 August 2009 - 03:44 PM, said:
Khan, on 06 August 2009 - 04:35 PM, said:
Thanks :)
I can run at 850 Mhz.
I will be running them with dynamic core, MAX cycles, 22050 sample rates
Update: Dosbox videos coming today :-)
OMG I'm so excited! Delicious videossss. My precioussss
#6
Posted 08 August 2009 - 11:46 PM
Dosbox: Duke 1 and Airborne Ranger
Dosbox: Commander Keen 4
In Keen 4, I notice that the sound effects are a little delayed. Is there any particular reason that the music was turned off?
#7
Posted 09 August 2009 - 12:28 AM
Esn, on 09 August 2009 - 01:46 AM, said:
Dosbox: Duke 1 and Airborne Ranger
Dosbox: Commander Keen 4
In Keen 4, I notice that the sound effects are a little delayed. Is there any particular reason that the music was turned off?
Tsk, impatient. The first three minutes of the second video is darkness, so you at least ought link it like this. Though here's hoping Pickle fixes the encoding issue or whatever.
And yay, Commander Keen! Been a while since I played that. Same goes for Pickle, judging by the video. :P
#8
Posted 09 August 2009 - 12:38 AM
This post has been edited by fischju2000: 09 August 2009 - 12:47 AM
#9
Posted 09 August 2009 - 01:01 AM
Pickle, on 06 August 2009 - 09:44 PM, said:
Khan, on 06 August 2009 - 04:35 PM, said:
Thanks :)
I can run at 850 Mhz.
I will be running them with dynamic core, MAX cycles, 22050 sample rates
Update: Dosbox videos coming today :-)
With the exclusion of overclocking the processor, do you see much (or any) room for optimisation with DOSBox? Or is this likely to be the best we're going to see on the Pandora?
#10
Posted 09 August 2009 - 02:32 PM
BTW, Dark forces looks gorgeously smooth (maybe we can expect 486DX/33 perfomance figures after all :) )
Great job, Pickle! ;)
This post has been edited by Khan: 09 August 2009 - 02:35 PM
#11
Posted 09 August 2009 - 04:38 PM
Khan, on 09 August 2009 - 09:32 AM, said:
Great job, Pickle! ;)
with a very good dynamic recompiler running 32 bit x86 code you could easily get up to 486DX-120Mhz performance depending on the software being run
with interpreted emulation running 16bits x86 code, not much more than a 386DX-16
the 486 had only 8KB to 16KB of cache (code+data)
ARM and 486 have roughly equivalent cycles-per-operation speeds (1 cycle/op)
where it hurts:
* emulating has an overhead for emulating I/O and virtual memory
* the ARM ALU is 32bits only (8bits and 16bits ops needs to be shifted back and forth to emulate the x86 flags)
* ARM instructions are 4 bytes long and recompiled code is much larger so a lot less fits in the cache VS x86 code
* Thumb instructions are 2 bytes long but are much less efficient for emulation because you need a lot of bit-shift operations and address calculations which the 4byte ARM code does better, so you end up with much more Thumb code than ARM code for emulating the same x86 opcodes.
x86:
ADD [EBX], EAX ; 2 bytes, 3 cycles (on 486)
ARM:
ldr r12, [r1] ; 4 bytes, 1 cycle
adds r12, r12, r0 ; 4 bytes, 1 cycle
str r12, [r1] ; 4 bytes, 1 cycle
both take the same time, but ARM takes 12 bytes VS 2 bytes, less fits in the instruction cache = more memory delays
that's not even counting the overhead of emulating virtual memory.
emulating a MIPS processor with flat memory (N64, PS, PSP) is a lot easier for an ARM than emulating an x86 in protected mode (dos4gw, windows, etc)
#12
Posted 10 August 2009 - 06:53 AM
Stephane Hockenhull, on 09 August 2009 - 06:38 PM, said:
Don't you think "very good" and "easily" are mutually exclusive? :) If it's that easy I'm eagerly waiting for that performance, let's say before end of this year.
Quote
ARM and 486 have roughly equivalent cycles-per-operation speeds (1 cycle/op)
where it hurts:
* emulating has an overhead for emulating I/O and virtual memory
* the ARM ALU is 32bits only (8bits and 16bits ops needs to be shifted back and forth to emulate the x86 flags)
If your x86 regs are in memory, shifting can be skipped by using ldrsb and ldrsh.
Also if you compute flags for each instruction without checking if it's really needed you'll never approach your claimed 486DX-120Mhz performance.
Quote
* Thumb instructions are 2 bytes long but are much less efficient for emulation because you need a lot of bit-shift operations and address calculations which the 4byte ARM code does better, so you end up with much more Thumb code than ARM code for emulating the same x86 opcodes.
Thumb-2 can help here (though I find it disgusting :P ).
Quote
ADD [EBX], EAX ; 2 bytes, 3 cycles (on 486)
ARM:
ldr r12, [r1] ; 4 bytes, 1 cycle
adds r12, r12, r0 ; 4 bytes, 1 cycle
str r12, [r1] ; 4 bytes, 1 cycle
both take the same time, but ARM takes 12 bytes VS 2 bytes, less fits in the instruction cache = more memory delays
that's not even counting the overhead of emulating virtual memory.
Some comments:
- Your ARM code sequence has a load-use penalty between the ldr and the adds.
- You are missing AF and PF flag computation.
- Translating one instruction at a time will not provide good enough speed.
#14
Posted 12 August 2009 - 05:45 PM
Quote
ADD [EBX], EAX ; 2 bytes, 3 cycles (on 486)
ARM:
ldr r12, [r1] ; 4 bytes, 1 cycle
adds r12, r12, r0 ; 4 bytes, 1 cycle
str r12, [r1] ; 4 bytes, 1 cycle
both take the same time, but ARM takes 12 bytes VS 2 bytes, less fits in the instruction cache = more memory delays
that's not even counting the overhead of emulating virtual memory.
Some comments:
- Your ARM code sequence has a load-use penalty between the ldr and the adds.
- You are missing AF and PF flag computation.
- Translating one instruction at a time will not provide good enough speed.
[/quote]
true, true, I'm just giving a rough estimate, and I figure some good recompiler would interleave the instructions, canceling the penalty.
what I mean is the 486 and ARM9 have roughly equivalent performances running native code, ARM benefits from more registers and free shifting operations, 486 benefits from code size.
cycle for cycle, excluding register spills, you get similar performance.
and old 486 PCs don't have too weird of a hardware like 2 to 6 processors to emulate: no Blitter, Copper, raster DMA, etc.
if the game supports VESA (doesn't use ModeX nor 16 colors planar) you don't have much impact to emulate the hardware side.
80% to 90% of the time spent in a 486/early Pentium era game was in the software rendering and transferring the finished frame (up to half of the time spent!) to the video card.
it makes those games "easy" to get good emulation performance compared to earlier CGA/EGA games (complex hardware) and later 3D-accelerated FPU/MMX heavy games.
only problem is that a lot of the good games of that era have the source code available, making this point moot :)

Sign In
Register
Help

MultiQuote