What's the proper implementation for hardware emulation?
Asked Answered
T

3

4

I'm going to programme a Game Boy emulator (Z80 is the CPU in case somebody is not familiar with it), and while I was doing my research, I've found some things I'm not so sure about.

The first one was that C is the programming language to choose here. That's not so much of a problem, but I'd like to hear your opinion from today's point of view. Even C++ was not recommended.

The second thing I found out was that everybody was using one function per opcode. That seems logical since it's just one function call and probably better optimised than having one function for the "ADD" instruction and then you've got to find out what registers are used here. But how necessary is that today? Is it something I should stick to or should I rather rewrite my emulator if I notice that another way which might be more convenient just doesn't cut it (more or less modern gaming consoles pop into my mind right now)?

Also, it's kind of demotivating to write a function for "add that register to this register" over and over again. Is there a way to automate that from an opcode map or something like that?

Teacup answered 28/3, 2013 at 21:2 Comment(4)
One function per opcode would be rather slow. You could try identifying opcode chunks. Or just write an engine to transform z80 ASM into x86 ASM (might be easier?). It's not a simple undertaking. And there is no reason why you shouldn't use C++.Protrusive
Not everyone does it that way - a personal favourite of mine is "decoding cleverly" (ie nested switching on a bitfields) and then implementing ADD once and letting it use the final bitfield to take the src register from an array. Uses of (hl) take some special trickery.Breeching
There are a number of short(ish) emulator programs here.Privet
dingrite: I might do that for some instructions. I don't feel that comfortable with ASM yet. harold: That's probably what I'll go for. luser: Thanks. I'll take a look at those.Teacup
R
2

First suggestion, you shouldn't use nested switch statements, you should rather use array of function pointers, alot faster -> better emulation, and nicer code, nested switch-es can also get a bit messy, here are some links where you can read more about these arrays
http://www.newty.de/fpt/fpt.html
http://www.multigesture.net/wp-content/uploads/mirror/zenogais/FunctionPointers.htm

Second suggestion, Yes you can do it in C#, Java, C++, but since you want every single bit of your CPU cycles so you can get as close emulation as possible - emulating one CPU cycle of target architecture with least number of CPU cycles on curret architecture, and OOP isn't so good in this case from what I heard/read from people. One of the things is performance, and second is pretty much obvious, emulation is, as you probably noticed, really complex task and wraping it in OOP can be unnecessary pain in the neck.

Replacement answered 3/4, 2013 at 1:35 Comment(0)
B
10

I mostly agree with WingsOfIcarus. I wrote a few emulators already so here is my insight:

  1. The use of function pointers is a good idea (for speed and clarity of code)
  2. OOP is not a problem

    Yes, member calls are a little bit slower, but if you are careful it will not affect performance too much. On the other hand, OOP emulation code is much better to manage/read/understand.

  3. Use an instruction database instead of fixed instruction decoding.

    I am using a single text file which consist of all the necessary information for all instructions. The emulator parses it during initialization (feeds the arrays of function pointers and operands...). In this architecture it is very easy to correct errors in the instruction set without any code change.

    Complex instruction sets documentation are almost always faulty to some point. The worst case is Z80 (I have never see a 100% error-free instruction set). So use more instruction sets, compare them and create an error-free set (if you can).

  4. Add sound, video, keyboard and mouse to your emulation

    This is usually not a problem. On Windows use WaveOut instead of DirectSound. It's more stable, much faster (usable latencies of DSound are sometimes even > 400 ms). With WaveOut I was able to lover latency to 20-80 ms which is OK.

  5. Apply limit speed by T cycles of emulated CPU per second

    I am using machine cycle correct timings which is much slower, but allows me to correctly implement any hardware periphery emulations as (FDC, DMAC, sound chips,...without any hacks)

  6. Apply load/save of files for the emulated platform

For example, this is part of my instruction set (which is directly fed to CPU emulation:

opc      T0 T1 MC1   MC2   MC3   MC4   MC5   MC6   MC7   mnemonic

B8       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,B
B9       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,C
BA       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,D
BB       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,E
BC       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,H
BD       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,L
BE       07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,(HL)
BF       04 00 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 ... 0 CP A,A
C0       11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET NZ
C1       10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 POP BC
C2L2H2   10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP NZ,U16
C3L1H1   10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP U16
C4L2H2   17 10 M1R 4 MRD 3 MRD 4 MWR 3 MWR 3 ... 0 ... 0 CALL NZ,U16
C5       11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 PUSH BC
C6U2     07 00 M1R 4 MRD 3 ... 0 ... 0 ... 0 ... 0 ... 0 ADD A,U8
C7       11 00 M1R 5 MWR 3 MWR 3 ... 0 ... 0 ... 0 ... 0 RST 00H
C8       11 05 M1R 5 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET Z
C9       10 00 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 RET
CAL2H2   10 10 M1R 4 MRD 3 MRD 3 ... 0 ... 0 ... 0 ... 0 JP Z,U16

opc:    operation code [hex]
        L1,H1,U1,S1 means first operand direct number or address
        L2,H2,U2,S2 means second operand direct number or address
        L3,H3,U3,S3 means third operand direct number or address
        H,L ... U16 high and low byte
        U   ... U8 unsigned byte
        S   ... S8 signed byte

T0      normal instruction duration [T] always 2 decimal digits
T1      instruction duration if condition not met [T] always 2 decimal digits

MC1++   Machine cycle first is type,second is duration [T] always 1 decimal digit
        ...     unused
        M1R     M1 cycle
        MRD     memory read
        MWR     memory write
        IOR     IO read
        IOW     IO write
        NON     no external operation (internal computation)
        INT     interrupt cycle

mnem    instruction text (mnemonic)
  • opc is used for the address in an array of pointers
  • mnemonic is used to select the proper function pointer, and operands type
  • T0 and T1 are used for instructions timing (this is enough for rough emulations)
  • MC1++ are used for correct MC timings (to implement correct hardware emulation and contentions timing)

Here is my Zilog Z80A complete instruction set with machine cycle timing link for download. Feel free to use (just mention my nick somewhere). After porting to this I was finally able to 100% pass the ZEXALL test. For more info see Writing a graphical Z80 emulator in C or C++.

Befall answered 20/9, 2013 at 7:54 Comment(4)
Hi Specktre, I have a question: in several c++ source regarding Z80 interpreter, there is a table to count total cycles of a DD/FD instruction. For instance "DEC r" gives 9 cycles in that table instead of 8 cycles as your z80_iset.dat. In fact, any M1R+M1R instruction of that table gives 9 cycles instead of 8 cycles. It really annoys me as I need t-states count BETWEEN M1R, MRD and MWR as I need to provide accurate /wait states according to some hardware constraints (the instruction will resume when video /BLANK is 1). I don't understand where that 1 extra T-state comes from.Cauterant
@Cauterant My guess is they most likely come from a bug in documentation or source of your interpreter itself. 8(16)bit inc,dec is 4(6)T without DD/FD prefix and 8(10)T with it so never 9T !!! For example: DD05 08 00 M1R 4 M1R 4 ... 0 ... 0 ... 0 ... 0 ... 0 DEC B However its also possible that the wrong timing is reflecting something entire else like contention or discrepancy in timing elsewhere. dec r insruction does not have MRD cycle no need to read memory other than OPC...Befall
@Specktre Thanks! I believe this is probably an error coming from a common source borrow (I saw several Z80 interpreters using almost the same tables). This is not my emulator but I'm improving it to have a better hardware emulation in timings. I decided to use your z80_iset.dat to generate the tables I need and corrected that 9 into 8 as well. Your file is very useful and I'm grateful for the work you did. Thanks again!Cauterant
@Cauterant it took years to compile... I ended up creating MySQL table for each iset I could found and show all differences between all ... then repair found inconsistencies ... and then export to this dat file ... after separating into machine cycles erros showed them self obviously and I was able to repair the iset to the point it passes 100% ZEXALL in my emulator using this dataBefall
R
2

First suggestion, you shouldn't use nested switch statements, you should rather use array of function pointers, alot faster -> better emulation, and nicer code, nested switch-es can also get a bit messy, here are some links where you can read more about these arrays
http://www.newty.de/fpt/fpt.html
http://www.multigesture.net/wp-content/uploads/mirror/zenogais/FunctionPointers.htm

Second suggestion, Yes you can do it in C#, Java, C++, but since you want every single bit of your CPU cycles so you can get as close emulation as possible - emulating one CPU cycle of target architecture with least number of CPU cycles on curret architecture, and OOP isn't so good in this case from what I heard/read from people. One of the things is performance, and second is pretty much obvious, emulation is, as you probably noticed, really complex task and wraping it in OOP can be unnecessary pain in the neck.

Replacement answered 3/4, 2013 at 1:35 Comment(0)
M
1

Here's a pretty cool implementation of working with some opcodes for an NES emulator:

http://bisqwit.iki.fi/jutut/kuvat/programming_examples/nesemu1/

Here's the accompanying youtube videos that have a little more explanation as to what's going on

http://www.youtube.com/watch?v=y71lli8MS8s

It uses C++ templates and some additional C++11 features. As to whether you choose C++ or C that is up to you but it shouldn't really matter a whole lot. If you're just emulating a gameboy I doubt that speed is going to be an issue on modern processors so try to just use whatever you're comfortable with.

Melda answered 28/3, 2013 at 23:48 Comment(1)
The video is probably more confusing than the source itself but I'll take a look at it. Thanks.Teacup

© 2022 - 2024 — McMap. All rights reserved.