Hacker News new | past | comments | ask | show | jobs | submit login
Intel 8080 emulator. 19th IOCCC. Best of Show (nanochess.org)
256 points by hggh 9 months ago | hide | past | favorite | 66 comments



One of my best hacks back in the day was an 8080 emulator, for the 8088/8086 back in the time when what you were really trying to do was run the 8080 code as fast as possible (because the first PCs were slow, and emulating 8080 code was much slower than running 8080 code on a native machine).

Invariably at the time the 8080 emulators would have a dispatch loop, fetching the next opcode and dispatching through a jump table to some code to emulate each of the 256 opcodes. One problem with this was that you couldn't dispatch this way without altering the CPU flags, so you'd also have to save and restore those with LAHF/SAHF (fast) or PUSHF/POPF (slow). In general you'd need about 10 8086 instructions to emulate 1 8080 instruction, and in the early 1980s this meant your emulated CP/M program would run much slower than on your old CP/M computer.

My emulator would emulate 1 8080 opcode with 4 8086 opcodes as follows; An 8080 instruction, say 0x94 = sub h would be emulated with code loaded at address 0x9494. That code would be;

  sub bh,dl   ;bh = bh-dl, emulate h with bh, a with dl
  lodsb       ;al = *si++, get next opcode, increment emulated pc
  mov ah,al   ;eg 0x94 -> 0x9494
  jmp ax      ;jump to next instruction
This is a classic example of trading space for time, your emulator is sparsely distributed through 64K of RAM. So you needed 128K to emulate your 64K 8080.


Shower thought the next day, OMG I got the subtraction round the wrong way! Should be (of course);

  sub dl,bh ;dl = dl-bh, emulate A with dl, H with bh
Not all opcodes are one to one like this one. For example 0x3e = MVI A,imm (move an immediate value to the A register) would be;

  lodsb     ;al = *si++, get the immediate value
  mov dl,al ;dl = al, A = imm
This time I've omitted the 3 instruction postscript (lodsb; mov ah,al; jmp ax) to all opcodes.

While I'm here adding extra information, note another strength of my approach; The si, al and ah registers are the only resources consumed for emulation machinery. The di, bx, cx, dl and f registers are available to emulate the DE, HL, BC, A and F registers without any memory accesse (the dh register was wasted sadly). Remember the host CPU was the 16 bit 8088/8086 machine which also had 'segment' registers to allow access to 20 bits of addressable memory rather than 16 (so 1M not 64K). The segment registers had a role in emulation, basically setting up a 64K 'code segment' to run the emulator and another 64K shared 'stack and 'data' segment to emulate 64K of memory for the emulated 8080. Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.


I was wondering about the registers, why not use the "standard" mapping for BC/DE/HL => CX/DX/BX, and DI for A?

One disadvantage of this would be you have to "xchg ax,di" before and after every accumulator op, but that is a single byte opcode (and on the 8088, instruction fetch is the main bottleneck). Just from intuition, it seems like for common code sequences like "MOV D,H ! MOV E,L", having the byte registers directly addressable would be a win. But maybe you profiled how common the various instructions are in real code?

>Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.

Not necessarily. Modern x86 CPUs running in 64-bit mode still support 16-bit code/data segments (the only thing not supported is 64 bit OS + V86 mode, and you don't need that for 8080 emulation). At least on Linux, this is made available to userspace via the modify_ldt syscall.

Of course, when you have 16-bit code and a CP/M emulator, might as well also make it run CP/M-86 programs (most of which do not use the kind of segment arithmetic that would require V86 mode). Then the 8080 CPU emulator would be just another code segment, and probably the easier part to write, compared to the code necessary to emulate the CP/M filesystem on Linux.


Good question and extra information, thanks. I wasn't aware of 16 bit code/data segments on 64 bit x86 CPUs.

I don't think I seriously considered using di for A (I've sort of started using a convention of lower case = 8086 registers, upper case = 8080 register so I may as well stick to it). Simply because of the many cases like the two I've used as examples, both of which are really simplified by having an 8086 8 bit register for A. As a 8080/Z80 programmer learning 8086 at the time, it took me a while to realise that al being off-limits for register A due to lodsb wasn't really a big problem, because any of the 8086 8 bit registers had equivalent expressability to the 8080 accumulator. At that point I stopped worrying about using the standard 8080 -> 8086 register mappings. It's more than possible that I was seduced by apparent simplicity and actual runtime performance could have been tweaked a little higher on average by something with a little more superficial complexity. I was in a hurry and looking to get my 8 bit tools running on my new box over a weekend or similar I think (I can't really remember exactly).

It was such a long time ago that I've forgotten a lot. Only after I wrote the "shower thought" post above did I remember that as well as dh, I had bp in reserve (basically I'd forgotten about bp). I suppose the extra resource (bp) meant even a normal dispatch loop emulator could also be written without needing to emulate any 8080 register with memory, contrary to what I wrote about above.

One optimisation I think I did do was actually to emulate MVI A,imm and other single byte immediate instructions with a lodsw rather than a lodsb, effectively prefetching the next opcode. I had the 256 opcode routines packed together initially, then I'd distribute them to their ultimate 0x0000, 0x0101, 0x0202 ... locations at runtime and append the lodsb; mov ah,al; jmp ax suffix then. So no problem if the code for MVI A,imm was lodsw; mov dl,al; mov al,ah; jmp ax; The appended suffix would still be there, it would just never execute.

My first implementation of this emulator took the form of a binary prefix that could be prepended to any CP/M 80 .com file with the Operating System's concatenate command (PIP ?), to make a CP/M 86 .cmd file. Later on I adapted it to MS-DOS with a similar deployment strategy, not a huge job because the basic system calls I needed were close cousins to MS-DOS equivalents. As you say, quite different for Linux.

In general the window of widespread usefulness for this tool was limited; basically the time when CPM/80 software was still competitively capable and PCs were still not at least an order of magnitude faster than their CP/M 80 predecessors (after that point conventional emulators were fine). I never took the chance to try and share my code with the world, that was harder then. So it never helped out anyone but me sadly.

Love your rep_lodsb handle by the way.


How would full translation work? The opcodes are close? Could you do a just in time version?


Yes, pretty close so full translation is easy if you have the source code. If not, emulation is the normal way. My emulator was faster than native 8080 code on everything but vanilla 8088 PCs, so it's more than sufficient.

I address translation in depth in my Retro Sargon project, an exercise in getting a vintage Z80 (8080 superset) assembly language program working on modern PCs. https://github.com/billforsternz/retro-sargon See the "Yet More Details" section in the README.


I think the 8086 was designed such that 8080 assembly code could be assembled as 8086 machine code. So I'd guess the instruction emulation would probably be reasonably efficient.


That’s genius.


Thank you! I really appreciate that.


Sadly, the International Obfuscated C Code Contest has not been held in the past 3 years, the last one being the 27th IOCCC from 2020 [1].

Looking forward to seeing more editions in the future, and submitting another entry myself...

[1] https://www.ioccc.org/years.html#2020


They seem to be re-organizing the website structure [1] as well as updating the submitting system [2], and lately they were making some progress [3], but still need a lot of formatting work.

[1] https://github.com/ioccc-src/temp-test-ioccc [2] https://github.com/ioccc-src/mkiocccentry [3] https://fosstodon.org/@ioccc/112023138011006567

There are currently only two active maintainers. I guess any help is welcome.


Seems like they are stretched a bit thin, and no doubt this is a hobby project.

This is a great steelman for where dogfooding does not apply - imagine if their website was actually an obfuscated C webserver with the content imbedded in it, and therefore impossible to update lol.


I've been waiting for the competition to open up again, I think I have an entry good enough. There does appear to be some activity on their GitHub repo, but I have no idea when the next one will be. Does anyone here have any gossip about when the next one might be? Wild rumor would also suffice.


The biggest factor I think is that the judging process cannot be easily parallelized after the initial filtering and traditionally needed multiple in-person judging sessions---there are several photos in the official Twitter account. (2020 judges didn't meet in person for the obvious reason, but still used conference calls.)


another casualty of the covid era, I assume /s


Not sure why the /s, it seems like a plausible contributing factor.


really? But isn't it an entirely online contest?


The humans who run online events still exist in the physical world.


Related, 8086tiny, a 2013 IOCCC winner. About twice as much source code, and if not cheating then certainly leaning heavily on the rules, but very impressive as it's a full x86 emulator with graphics and several peripherals.

Original entry: https://www.ioccc.org/years.html#2013_cable3

"Deobfuscated": https://github.com/adriancable/8086tiny

(Not mine, just a fan)


From the description, one small part of the code:

--64[T=1[O=32[L=(X=*Y&7)&1,o=X/2&1,l]=0,t=(c=y)&7,a=c/8&7,Y]>>6,g=~-T?y:(n)y,d=BX=y,l]

The difference between bf and c sometimes is small


It is in fact quite reasonable if you know a bit of C obfuscation and some expectation about 8086 interpreters. First, indentations:

    --64[
        T=1[
            O=32[
                L=(X=*Y&7)&1,
                o=X/2&1,
                l
            ]=0,
            t=(c=y)&7,
            a=c/8&7,
            Y
        ]>>6,
        g=~-T?y:(n)y,
        d=BX=y,
        l
    ]
`a, b` will return `b` but evaluate `a` as a side effect, so anything preceding `,` can be (recursively) pulled out.

    L=(X=*Y&7)&1,
    o=X/2&1,
    O=32[l]=0,
    t=(c=y)&7,
    a=c/8&7,
    T=1[Y]>>6,
    g=~-T?y:(n)y,
    d=BX=y,
    --64[l]
`a[b]` is semantically equivalent to `*(a + b)`, so it is unexpectedly commutative. Let's swap them all and add some spacing. Ah, also some assignment expressions have been reused as a part of other expressions, which I will also pull out:

    X = *Y & 7,
    L = X & 1,
    o = X / 2 & 1,
    O = l[32] = 0,
    c = y,
    t = c & 7,
    a = c / 8 & 7,
    T = Y[1] >> 6,
    g = ~-T ? y : (n) y,
    d = BX = y,
    --l[64]
Now it should be much more clear that a large chunk of this code is instruction decoding. At this point we need to expand macros to make more sense out of it:

    #define Z short
    #define a(_)*(_*)&
    #define y a(Z)Y[++O]
    #define n char
It would need more reading to determine the exact meaning of `Y`, but given that its type is `unsigned char*`, it should be some sort of pointer, most likely the instruction pointer. The macro `y`, which expands to `*(short*)&Y[++O]`, increases `O` so it looks like reading from IP+0, IP+1, ... while not actually advancing the IP.

So the mysterious code can be now mostly decoded:

    X = *Y & 7,              // [IP+0] = ?????XXX
    L = X & 1,               //        = ???????L
    o = X / 2 & 1,           //        = ??????o?
    O = l[32] = 0,           // Reset `O` for further uses of `y`

    c = y,                   // [IP+1] = yyyyyyyy
    t = c & 7,               //        = ?????ttt
    a = c / 8 & 7,           //        = ??aaa???
    T = Y[1] >> 6,           //        = TT??????
    g = ~-T ? y : (char) y,  // Read [IP+2] and do something

    d = BX = Y[++O],         // [IP+3] = dddddddd
    --l[64]
[IP+1] surely looks like the ModR/M byte, so you can guess that `g` is a displacement even if you don't know what the heck is `~-T` (equals to `T - 1` in twos' complement). [IP+0] is too generic to pick the exact instruction, but my guess is that this code is shared for a lot of similar opcodes to save the code footprint.


Thank you. That is interesting!


> if not cheating then certainly leaning heavily on the rules

How so?


The description implies they found some clever way to trick iocccsize at the time. Using the current version of iocccsize, it weighs in at 3842, well over the max of 2053.

https://www.ioccc.org/2013/cable3/hint.html


No, `iocccsize` has been updated throughout the history of the contest. The hints mention this, but the accepted secondary size for `cable3.c` was 1,977 tokens. (Later versions of `iocccsize` don't compute secondary sizes at all.)

As far as I can tell, the actual abuse here was the inclusion of instruction decoding table in the custom BIOS which didn't count towards the total size at all:

> This conversion, along with instruction lengths and how each instruction modifies the flags, is assisted by some lookup tables which form part of the BIOS binary.


The BIOS file doesn't just contain what you'd expect from a normal PC BIOS, but also contains several tables used by the emulator for decoding of x86 instructions. Normally those tables would be in code as they're necessary regardless of BIOS implementation; they're presumably in the BIOS to save space.


Maybe one way to look at it other than "cheating" is that it's simulating a fictional x86-like processor which uses an external ROM for microcode.


Is there a competition where you win by showing something both compact and extraordinary but not obfuscated ? Like zlib compression in 100 lines or some song recognition using FFT in X lines, or a working ssh client in Y lines ?


This is generally called code golf, and there’s a whole forum for it: https://codegolf.stackexchange.com/

I think once you try to get something small enough the code will naturally tend towards obfuscation.


Demoscene? Mostly geared towards graphics but that's one example.


There is the 10-line BASIC Coding competition, which is frankly AWESOME:

https://gkanold.wixsite.com/homeputerium/copy-of-rules

Some of the winners of the last few years have been extraordinary - complete arcade games, with physics, in just 10 lines of BASIC. This competition definitely highlights the issue we have with software development, which is that we, as a species, don't generally get really good at it for decades...


> I intended to make it simple to understand, it uses only four C keywords.

Fantastic


Is there an ELI5 for this? How could something so small possibly be an 8080 emulator?


It's the Toledo family. All is possible.

https://news.ycombinator.com/item?id=28337064


Wow, that sent me down a rabbit hole -- but this line from the homepage makes it sound like he is _not_ part of that family?

> If you're looking for information about the Mexican scientist Oscar Toledo Esteva, then please go to the History page of the Familia Toledo organization site.

https://nanochess.org/index.html


I believe they are father and son. The `Books` section on the Toledo family website links to books written by Oscar Toledo G.


Good catch -- those same books show up on the nanochess site:

https://nanochess.org/author.html

Odd that he doesn't mention the "scientist" people may be looking for is his father, but everything else lines up.


Can you actually purchase any of their products, including the $99 computer?


Given the number of lies on the company's website, I'm not inclined to believe them about other things.


The articles on the browser tech are horribily outdated, and ditto with TLS libraries. But if they used something like mbedtls/bearssl and Gemini/Gopher, they would be still useful.


The 8080 is a simple processor, the "how to program" manual is only 60 pages.

Most of the instructions are changing registers in various ways, which is pretty simple. The Appendix A has all of them: https://altairclone.com/downloads/manuals/8080%20Programmers...


I am more amazed you can fit a (pretty large subset of) Haskell compiler/runtime in about twice the size [1].

[1] https://www.ioccc.org/2019/lynn/hint.html


Well, an a complete lisp system on that size, for sure. And forth environment.


Example? I meant under 8080. Seems to have sector size using x86?


I found a Forth for x86 under the IOCCC contest of 1992: the Buzzward.2 entry.

https://www.ioccc.org/1992/1992.tar.bz2

You need to edit the CFLAGS (or just comment that line), and by using gcc or even tcc will work fine.


The former meant under x86, and that's easy even for IOCCC.With awk you can write an RPN calculator in 10 lines or such...


I went too far and ran into the printed hexadecimal-decimal lookup table for every number from 0x0-0xFFF


ah yes, I had that poster on my wall as a kid... https://twitter.com/JBrooksBSI/status/952291667041599488/pho...


Now I'm having fond memories of discovering the IBM box drawing characters above 128 as a kid, and doing fancy text files.


The simplest emulator in the world would be like this one:

unsigned char memory[65536];

int reg[8];

int main(void)

{

    int pc = 0;

    /* In this point you read something into memory */    

    while (1) {
        
        instruction = memory[pc];

        if (instruction == 0x76)  break;  /* HLT */

        pc = (pc + 1) & 0xffff; 

        if ((instruction & 0xc0) == 0x40)    /* Handle MOV r,r (opcodes 0x40-0x7f) */

            reg[(instruction >> 3) & 7] = reg[instruction & 7];

    }

    return 0;
}

What it does? It simply reads each instruction from memory (PC is the Program Counter), and increments PC continuously limiting it to 16-bit.

It does a comparison for each possible instruction, for this example I only handled the most basic one of 8080 (MOV r,r) and HLT, but this is already 64 cases of the 256 possible!

Then I added iteratively chunks of opcodes to emulate and used language C macros to reduce the code size. For example, there is #define d(e) to join two 8-bit registers into a 16-bit value, and #define Q to read the accumulator.

In the end, to stay inside the size limit of the IOCCC, I removed the tree of 'if' statements and replaced it with a tree of trinary operators ?: (a way of doing 'if' inside an expression) and this is what the macros C and S does.


The 8080 has ~5k transistors. There's only so much complexity one can build with those.


The 8080 instruction set opcodes have logical bit patterns, so you can identify the operation and operands through bit masking instead of requiring a huge LUT. Beyond that you can simplify a lot by sacrificing timing accuracy.


you can also write one using a 256-byte switch, which ain't so big. i wrote an 8080 emulator in fortran77 on a dec10 that way, back in the 1980s. but of course size wasn't an issue. it did prove to be quite portable - to vax and ibm vm/cms. we used it for teaching assembler programming.


After simplifying and reducing the code as far as possible, it is rewritten in a compressed format using macros and without whitespace. Look at the #define statements.


Not running on macOS arm due to llvm I guess?

Not sure how to use brew without touching the llvm or Xcode etc., I switch to lima (search it under this news site for more details). It works (after doing sudo ... build essentials etc.).

For the BASIC, not sure how to quit the environment after learning it in 1979! Seems Ctrl-Z give you a seg-fault and it quit ... :-) But not sure where is the core-dump generated ;-p

For the CPM, must do the chmod +w A B as said (BUT NOT chmod +w A then chmod +w B ?!!). Still cannot exit the CPM until I dir and find the program HALT.COM (wow .COM) and A>HALT ... all good. :-)


A somewhat related ask for help -

I have seen a program, I think for IOCCC, where the code was aligned in 4 character columns that seemed to naturally fit the code, so it would have code arranged like:

  for( long i=0; i<n; i++)
  {int a=1, b=2, c=3, d=4;
for the whole program.

I lost it though and have been completely unable to find it, despite searching very hard.

Has anyone seen something like this before? I would be forever grateful for any pointer towards it.


Was it one of these?

https://www.ioccc.org/years.html#1995_heathbar

https://www.ioccc.org/years.html#2014_endoh1

https://www.ioccc.org/years.html#2006_sloane

May be related, I have an index that lists winning entries with some form of ASCII art, one of those could be what you were looking for.

https://docs.google.com/spreadsheets/d/e/2PACX-1vRg3T5z-QN66...


What did it do?


It was an interpreter for something. I don't remember what


It compiles on my machine but I can't run it, I always get a SIGSEGV. Same thing for the chess program.

I ran both the original build command and the "And now it is 2024" one.


For the 8080 emulator, you need the c_basic.bin (renamed to just "C") and c_bios.bin to be in the same directory as the executable, as per the article. I tried this and it dumps me into a BASIC shell. So cool!


BTW, CPM 2.2 doesn't run on that emulator in my machine (intel Atom, Linux, 32 bit, tried with gcc 8, clang 11, and tcc 0.29.7). Basic does.


Sadly, same here: following the instrs works for Tiny Basic (which is quite basic!), but CP/M segfaults. Any idea to get past this?


chmod +w A B after the copy got me a bios prompt!


Thanks!!


Is there a "Deobfuscated" code?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: