I still find it to be an interesting intellectual challenge, even though it has no commercial use that I know of.
The huge volume of embedded cores based on a 6502 would disagree strongly --- I'm sure everyone has at one point used a 6502-based embedded system. They're everywhere in things like keyboards and mouses, LCD monitors (the MCU responsible for generating the OSD and such), and toys like the famous Tamagotchi and Furby, as well as keychain picture frames:
I think it's worth noting that it's very easy to treat the 6502 as having 128 16-bit general purpose registers. If you're okay with poorly optimised code, that makes for a very nice compiler target, while still allowing you to break out to hand-written assembly for inner loops and other sensitive areas.
Would the Arduino chip be a better choice for the intellectual challenge? You get a lot of modules you can buy from different vendors, and at the end of the day it's a simple to program chip like the 6502 from what I have seen.
On the plus side, the 704 did multiplications and divisions. https://www-03.ibm.com/ibm/history/exhibits/mainframe/mainfr... gives it about 4,000 multiplications or divisions per second. I think that’s quite a bit faster than a 6502, on 36 bit words, but you don’t need 36-bit arithmetic for a LISP.
It also had floating point, but you don’t need that for a LISP, either.
I used a Pascal on a BBC Micro which consisted of 2 16KB ROMS: one for the editor and one for the compiler. It would have maybe 32KB of RAM minus the screen memory (1KB to 20KB).
The system fit in two 16KB EPROMS for a total of 32KB, but only one of the 16KB ROMS could be mapped into the address space at a time.
The way this worked was that the compiler was self-hosted (i.e. written in ISO Pascal and compiled itself) and generated our own stack-based virtual machine code which we referred to as BL-code based on our own initials.
One of the 16K ROMS contained the BL-code of the self-compiled compiler... which only fit after a considerable amount of effort, including a few "macro" BL-codes designed for the purpose. Remember this was full BSI-certified ISO-Pascal, plus Acorn extensions for graphics etc, not some toy subset.
The other 16K ROM contained everything else, meaning the BL-code interpreter, screen editor, run-time libraries (floating point - which we copied from BBC basic, Pascal I/O, heap, etc), and command line interpreter. The editor, which I wrote, was around 4KB and fairly sophisticated for the time, including full regex global replace.
One interesting tidbit is how the system actually ran given that the compiler was in one ROM, and the interpreter needed to run it in the other ROM, with only one ROM able to be mapped into the address space at a time... The way we handled this was to relocate the interpreter into RAM in order to run the compiler (but run from ROM when running user programs), so the interpreter was organized into pure code, pure data and relocatable address tables to make this possible.
Getting the whole system to fit into those two 16K ROMS was a heck of a challenge!
Having grown up in Brazil, I never had contact with the BBC micros until I became interested in retro computing. What you folks accomplished is not appreciated enough on the other side of the Atlantic.
Why do you find it incredible? It really depends on many features you want to implement. I had LISP in ROM on my BBC Micro. That was about 5.5K of code (6502) on the ROM plus some data. The machine had 32KB of RAM.
Agreed, Lisp will work on small environments pretty well. For smaller-still systems I'd recommend FORTH as a good alternative - as discussed in the article itself.
I say self-hosting, but on a 64kB BBC Micro second processor with floppy disk it takes about seven minutes to compile Hello World, so I haven't bothered to actually recompile the whole toolchain. (The overwhelming majority of of that time is spent doing disk I/O, as there's way too much state to keep in RAM. The compiler is an eight-pass behemoth.)
The language itself is a simple strongly-typed fully compiled thing with a syntax based on Ada, supporting nice stuff like nested subroutines and so on. It has native 8-bit types (unlike C). Its main claim to being interesting is that it statically allocates all variables, using a simple but effective algorithm to walk the call tree and assign multiple variables to the same address if they're not going to be used at the same time. It's super effective. (This is Wheeler's solution 1.) This feature made the entire project possible, because it allowed me to do without stack frames completely. Trying to access the stack on either the 6502 or Z80 is an utter disaster.
The 6502 is a _bizarre_ thing to generate code for. 8-bit code is fine, but 16-bit and above is painful --- efficient maths is really hard. I kept finding the generated code breaking down into tiny microloops because when doing arithmetic with 16-bit values it can actually be shorter to use a loop than to inline it (in certain circumstances). The instruction set is orthogonal, except when it isn't; there's no LDA zpg,Y for example, but there's a LDX zpg,Y. Things like moving values from one memory location to another are so expensive that the setup cost in using helper functions frequently outweighs the benefit.
But in general, once you get your head around it and accept that it simply cannot do things like 16-bit signed comparisons in a fashion which won't make you cringe, it's not too bad. Index registers are great, as is zero page indirection (at least for 8-bit offsets). It's fast, taking a few cycles per instruction. It's also fairly sensible: there's frequently only one sane way to do things.
The Z80 drove me nuts, though. It's unbelievably unorthogonal. (You can only do 8-bit direct memory accesses via A --- B, C, D, E, H or L cannot be directly read from or written to memory!) It's slow --- the non-8080 instructions are so painfully slow (ld ix, (abs) is 20 cycles!) that they're only barely worth it. The 16-bit stuff doesn't help nearly as much as you'd think, either; you can only do adds and subtractions, with limited registers, and there's no carry so they're no use so 32 bit operations have to be done using the 8 bit instructions anyway. I did find the resulting code density to be better than the 6502, but not that much.
I'm really looking forward to doing a 6809 port one day...
The next version of PLASMA has a JIT compiler that will compile PLASMA byte code routines into native machine code based on call frequency. Currently supports 6502 and 65802/65816 backends into a 4K code buffer. It doubles the speed of the PLASMA compiler, itself written in PLASMA.
This thing can use almost 3/4 watt of power at normal clock speed. For that amount of power you could use an ESP32 and get a lot more like Bluetooth etc.
The huge volume of embedded cores based on a 6502 would disagree strongly --- I'm sure everyone has at one point used a 6502-based embedded system. They're everywhere in things like keyboards and mouses, LCD monitors (the MCU responsible for generating the OSD and such), and toys like the famous Tamagotchi and Furby, as well as keychain picture frames:
https://hackaday.com/2013/05/24/tamagotchi-rom-dump-and-reve...
https://news.ycombinator.com/item?id=17751599
http://spritesmods.com/?art=picframe
Also popular "not very good for HLL" architectures that yet have (subset of) C compilers for them include the 8051 and the Microchip PIC series.