BOX-256: a tiny game about writing assembly code to pass the graphics tests

zokier · on April 2, 2016

Love it, very Zachtronicesque. If the author is reading, here are couple of improvement ideas:

* Show the level number somewhere so its easier to discuss with others

* Add some way to know how well I did on the level, either by having some leaderboards or rating system/target cycle count

* Allow pausing the execution when running

* Not sure if it would make sense to have full 8-bit color palette instead of 4 bits. Probably does not matter to game design too much

I think it might be neat idea to implement this on real hardware with these sort of rgb led modules http://i.imgur.com/4YnDwWu.jpg and some way to input code, maybe row of toggle switches Altair/PDP style or small keypad like in KIM-1

tehbeard · on April 2, 2016

8 bit palette means trying to match the colour against 256 possibilities instead of 16.

zokier · on April 2, 2016

That reminds me of one additional improvement suggestion: allow viewing the color values of the target pixels somehow.

zokier · on April 3, 2016

I just had to make a quick mockup of hardware version of BOX256: http://i.imgur.com/Vw1i459.jpg

keely · on April 3, 2016

Hi, I'm the author.

Looks like the game went a bit viral and my personal website bandwidth limits were exceeded.

You can now play here instead: http://bit.ly/1V1PiHt

golergka · on April 3, 2016

Could you build OS X version too, please?

keely · on April 3, 2016

I will try to build an OSX version tomorrow(ish), when I get my hands on an OSX machine.

azeirah · on April 3, 2016

Made a subreddit for this game: https://www.reddit.com/r/box256

tromp · on April 2, 2016

Visiting that webpage causes my SUSE Firefox 45.0 browser to say "You need a browser which supports WebGL to run this content. Try installing Firefox."

fpgaminer · on April 2, 2016

Probably a driver issue. Firefox 45.0 supports WebGL, but it may be disabled on your machine if it can't find a suitable OpenGL driver. For reference, the site worked fine on my Ubuntu 15.10 machine, Firefox 45.0, Intel GPU.

impomatic · on April 4, 2016

Four squares solved in 7 cycles, 16 threads and overlapping code :-) https://twitter.com/john_metcalf/status/716901521036861440/

kencausey · on April 3, 2016

I looked at this a couple of times today before finally digging in and as I completed each level I looked forward more and more to the next level. And then there were no more levels, far sooner than I expected. I hope the author adds more in time.

Edit: Oh and earlier today after having looked at it many times I bought http://www.lexaloffle.com 's combination of Voxatron and Pico-8. I've spent sometime today learning about Pico-8 (to which I'm going to limit my attention for now) and I think anyone interested in the linked game might be interested in these as well.

keely · on April 3, 2016

There will be more levels in few days.

kencausey · on April 3, 2016

Great, thanks!

SilasX · on April 2, 2016

Neat! Would like my own editor though. This doesn't have a lot of text editing capabilities and is fixed size.

How well does this map to real x86 assembly programming?

zokier · on April 3, 2016

Real x86 is very different; you have limited registers (especially 32bit x86), very weird variable length instruction encoding, gazillion different instructions, memory access takes significant amount of time. Of course having near infinite amount of memory compared to the 256 bytes of BOX256 changes the way you program your code too.

kencausey · on April 4, 2016

This has moved to http://box-256.com/

azeirah · on April 2, 2016

Copying and pasting is not working for me :(

jsnell · on April 2, 2016

The manual implies that's a limitation of the Unity WebGL player, but works in the standalone version.

azeirah · on April 2, 2016

Oh I was confused for a moment, thought that only counted for the "external" clipboard. Obviously not :(

amstocker · on April 2, 2016

Very cool, reminds me a lot of TIS-100

johnlinvc · on April 3, 2016

It's HNed, I got 509 Bandwidth Limit Exceeded. Looking forward to play it.

OMGWTF · on April 3, 2016

My current scores:

  Square:       0x07 cycles
  Checkerboard: 0x4C cycles
  4 Squares:    0x09 cycles

I will post my solutions in 24-48 hours.

ekimekim · on April 3, 2016

Checkerboard: 0x24 cycles. http://imgur.com/C1UeEdn (EDIT: 0x16, see below)

My key insight was twofold:

1. If you start every thread at 0x00 and make the first N commands "THR @00", you get 2^(n+1)-1 threads running by the time your first thread exits the chain of THR commands.

2. You can use a single array MOV instruction to set a range of program counters, effectively causing every thread to jump to a set position.

So in the first 5 cycles I start 31 threads, then my main thread drops into a loop where all it does is set every thread's position (including its own) to a start point every 2 cycles. Every single thread is now a 2-cycle loop with no JMP cycle spent to return to the start each time. Then I simply make 16 loops of:

  PIX counter color
  ADD counter 1 counter

with color being one of two memory positions, which the main thread swaps out each loop. I initialize counter such that each thread is doing its own row of the output (0, 10, 20, etc). The leftover threads I don't actually want (but it's hard to fine tune the thread count, so I left them in), so they share the first thread's loop. The repeated actions are effectively a noop.

Note that, due to space constraints, I use the unused 4th byte of each PIX instruction as the counter for that loop.

Doing this got me to 0x24 cycles. I shaved a further 4 cycles by using the leftover space to add 4 more loops. These ones run vertically and operate on the last two columns (two threads doing one column together). By the time the row-moving threads have reached the 2nd last column, the column-moving threads have finished the last two columns, and so the program is finished 4 cycles earlier.

I think I could get another 2-cycle saving if I fine-tuned the number of threads started, compressed everything down again and got another two column-moving threads.

The bigger improvement is converting the whole thing to 1-cycle loops instead of 2-cycle, and having half the loops incrementing the counters and the other half doing PIX instructions. This might just be possible in the available space if I've run the numbers correctly, and would result in a total time of 0x14.

EDIT: Did the latter: http://imgur.com/zD1Gw5H

It came out to 0x16, as I failed to account for the extra time to spin up more threads.

chipsy · on April 3, 2016

I started optimizing by making separate increment/render threads with a buffer and copy, and then doubled it to 4 threads. It's currently at 0xDD cycles but I don't think I'm going to take it farther:

http://i.imgur.com/sD5IAXP.png

The thread sync is pretty inefficient. It will need to incorporate some "tricky" techniques like what you've described to go faster.

keely · on April 3, 2016

Awwww this is great stuff. I would love to hear your thoughts on how the language could be changed/improved to make puzzles and optimizing even more fun.

ekimekim · on April 3, 2016

The biggest thing for me is being able to save/load solutions. I ended up keeping multiple browser tabs open as I tried to tweak solutions / attempt different puzzles.

As for the language itself, a MOD operation to go with the DIV operation is useful. bitwise AND / OR / NOT / XOR could be interesting.

Something that would make threads a lot more versatile would be some way to have them behave differently from one another even when executing the same code.

Perhaps a new instruction like JTI X Y: Jump if Thread Id (defined as 0xFF - location of program counter) is equal to Y

Or much more versatile but harder to account for in opcodes: a new prefix #XX which behaves like @(XX + thread id). So you could, for example, write 0x42 to a per-thread slot in an array starting at 0xA0 by doing "MOV 042 #A0"

For making the puzzles interesting: The problem is most shapes and images can simply be described in memory and subsequently brute-force PIXed in sequence. After the simplest puzzles, the trick is having images that do not fit easily in memory and require procedural description. Checkerboard is a good example here. What about something like fizzbuzz but described in colors (like this: http://imgur.com/RMEaKAg)? Or tricky questions where a seemingly random image actually has a simple pattern (like this: http://imgur.com/TbTPZnh (spoilers here: http://imgur.com/73De5QF ))? Unfortunately you're very limited as there's no actual input; the program's exact output is always entirely known.

Several people here have mentioned TIS-100: It's not the exact same thing as what you're doing, but it could be an interesting thing to compare with if you haven't played it yet.

EDIT: Just for fun, my best time for that second puzzle I posted is 0x103.

keely · on April 3, 2016

Interesting thoughts. Thank you for the feedback.

The copy-pasting to clipboard doesn't work in the webgl for browser security reasons, but you can download a Windows build that has working copy to clipboard. I'll make an OSX build later also. See the link here: http://bit.ly/1V1PiHt

How to make threads operate differently on same code is interesting. I don't really want to do another prefix, because I want to keep the realism of 255 opcodes and I've already used 128. The permutations of new prefix would not fit anymore. I also like the JTI suggestion, but perhaps it doesn't go as far as I would like.

Currently I'm thinking that maybe in addition to program counter, the thread would have another slot where it can define "memory offset". You could state the offset when you start the thread, but it is also modifiable later on, as its in memory. You would call for example "THR @023 010 000", which would essentially mean that for ALL operations for that thread, ALL memory addresses get a offset of +010. I think this is very close to what you suggested with threadID. I think the offset would live next to program counter (FE & FF). Do you think that would work?

I've played TIS-100 and named this game similarly BOX-256 as a homage, since TIS-100 pretty much inspired the whole thing. It's a wonderful game.

Great ideas on the new levels. I will shortly update with more levels and I'll be sure to add your suggestions in to the mix :)

ekimekim · on April 3, 2016

You can't access localstorage or anything for persistence? Can't create a popup with copy-pastable text? Any means of input or output from the game short of screenshots and manual typing at all?

The "all operations offset" sounds tricky but interesting. It'd be problematic in a typical usage scenario, but probably work well with my horrible "every line of code runs in a 1-cycle loop on its own thread" style :P

keely · on April 3, 2016

I will go for the tricks you mentioned for the input/output in browser, but will take some time to implement.

I'll take a step back and think about the threading some more.

Kristine1975 · on April 3, 2016

It currently has too many instructions. A single one is enough: https://en.wikipedia.org/wiki/One_instruction_set_computer

OMGWTF · on April 7, 2016

Solution for square: spawn threads, distribute PIX instructions for each pixel evenly across threads.

Checkerboard and 4 Squares: http://imgur.com/a/dJRhM

azeirah · on April 2, 2016

The `thr` statement is not very useful at the moment.

It's impossible to pass parameters to a piece of code using threads :(

I have to duplicate my code in order to use the threads

zokier · on April 3, 2016

Not sure how you would integrate that sort of thing to the ISA though. Maybe something like:

    00 THR 004 001 @A0
    04 JEQ 001 @A0 010
    08 --- parent thread
    ...
    10 --- child thread

where THR instruction would store a value that is visible only to child in some memory location, in this example value 001 would be stored at @A0, which is then used to distinguish the threads in a conditional jump.

chipsy · on April 3, 2016

It's really ungainly to work with -- but it does drop the cycle counts to solve the problems by running more threads.

ekimekim · on April 3, 2016

What's really funny is when the program's trivially paralellizable (any of them where the entire thing to draw fits in memory) and the main thing your threads are doing is to start more threads, then finally all the threads do 2 draw instructions each.

doomrobo · on April 2, 2016

FYI this gives a javascript error in Safari 9.1 private browsing mode. Disabling private browsing fixes the problem.

danjayh · on April 2, 2016

Wish it had a 'mod' instruction :(

achikin · on April 2, 2016

Will there be a standalone version for Mac?

keely · on April 3, 2016

Yes, in a few days hopefully