Hacker News new | past | comments | ask | show | jobs | submit login
Fitting a FORTH in 512 bytes (p4.team)
190 points by jpegqs on June 11, 2021 | hide | past | favorite | 57 comments



Gonna plug my own sectorforth here: https://github.com/cesarblum/sectorforth

The linked post is really impressive. The author went way beyond what I did. I'm especially impressed by the I/O stuff.


OpenBoot/Open Firmware[1] also used Forth.

Although the standard was withdrawn in 2000, it continued to be used by PowerPC Macs until Apple moved to x86 and EFI/UEFI.

Open Firmware is now actually open source as OpenBIOS (and OpenBoot is also available under a BSD license.) Apparently it can be used as a qemu boot rom for Linux, BSD, Solaris, and PowerPC versions of Mac OS 9 and OS X.[2]

[1] https://en.wikipedia.org/wiki/Open_Firmware

[2] https://www.openfirmware.info/OpenBIOS


Man, Forth is such a cool language. It really captures (what I perceive to be) the "hacker mentality" of powerful but simple tools, without much overt concern for polish/slick interfaces.

I mean I know a lot of foundational code of yesteryear was hacked together in C, but C lacks a certain ineffable cachet in my view. A cachet that Forth definitely has.


FORTH is a fun little class of languages. These are easy to implement; in a few KB you can have a working development environment, with a screen editor, debugger and whatnot.

In my experience, FORTH doesn't really scale to big projects, or things that need a lot of dynamic allocation. It's great at low-level stuff like hardware bringup (we used it on the Atari ST, back in the day, and even the hardware engineers were writing code to exercise their designs), but doing anything at scale is going to be tough. You probably don't want to write a modern web server platform in it, for instance. But it's probably great on an Arduino with a serial port.

There were a lot of games started at Atari in FORTH ("Hey, this is simple! Who needs to hire expensive assembly language programmers?") but the only project that shipped was the pinball machine "Four by Four". I saw a bunch of expensive disasters and mostly steered away from it.


If you need dynamic allocation, you can invent a Forth dialect for your specific needs. And implement it in Forth. A domain-specific language for your problem domain, embedded in Forth.

Consider control statements like conditionals or loops. They are implemented in Forth to express what is essential for imperative programming.

You can just as easily implement region handling much like what Cyclone [1] did. Invent region, dispose region, allocate within region, etc. You can augment semicolon word to not only check for control statements to be valid, but also check for region handling correctness.

[1] https://en.wikipedia.org/wiki/Cyclone_(programming_language)

Of course, you have to be either well educated or highly inventive to do that. ;)

And at the time of Atari most of programming projects were expensive disasters, not only games written in Forth.


From https://news.ycombinator.com/item?id=21823081

I wrote malloc.fth for Mitch Bradley's ForthMacs, which ended up in OpenFirmware:

https://github.com/openbios/openfirmware/blob/master/ofw/cor...

    \ Forth dynamic storage managment.
    \
    \ By Don Hopkins, University of Maryland
    \ Modified by Mitch Bradley, Bradley Forthware
    \ Public Domain
    \
    \ First fit storage allocation of blocks of varying size.
    \ Blocks are prefixed with a usage flag and a length count.
    \ Free blocks are collapsed downwards during free-memory and while
    \ searching during allocate-memory.  Based on the algorithm described
    \ in Knuth's _An_Introduction_To_Data_Structures_With_Applications_,
    \ sections 5-6.2 and 5-6.3, pp. 501-511.
Forth systems running on C libraries can also call back to the native malloc/free implementation, but it's nice to have a pure FORTH implementation for embedded applications.


This may be a tangent but I've encountered this code comment before and been confused; nice to be able to reply to its author! The confusion is that there's no book by Knuth called "An Introduction to Data Structures With Applications" (and Knuth's TAOCP doesn't have a section 5.6; his Chapter 5 on Sorting ends with 5.5 and whatever stuff is in TAOCP about memory allocation is Chapter 2; page 501 would put you in "Answers to Exercises"). Most likely what was intended is the book "An Introduction to Data Structures With Applications" by Jean-Paul Tremblay and Paul G. Sorenson (https://www.amazon.com/dp/0070651574 , ISBN 0070651574, https://books.google.com/books?id=ha4mAAAAMAAJ) which from what I can find online has 7 chapters:

1. Information and its storage representation

2. The representation and manipulation of strings

3. Linear data structures and their storage representation

4. Linear data structures and their linked storage representation

5. Nonlinear data structures

6. Sorting and searching

7. File structures

And so Chapter 5 probably has a 5–6.2 and 5–6.3 that cover such a memory allocation algorithm.

Edit: I found a weird plagiarized/condensed version of the book online, and section 5.6 is indeed called "Memory Allocation; Garbage Collection", so quite likely that the book by Tremblay and Sorenson is what is meant.


I love HN.

"I read your Forth code from 20 years ago, please help me understand your comment string" - to the original author.

I am praying this gets resolved. This needs to be on the front page by itself.


Well nothing so romantic: I didn't read it 20 years ago; more like some time in the last couple of years. And though I vaguely recall seeing this comment in the context of its code (on GitHub?), I may have well ended up there via a HN comment in a thread much like this, so that wouldn't be surprising. :-)


Great question! I vaguely remember reading it in a tan Knuth book, and the subject and algorithm feels very Knuthian to me, but I can't remember exactly. But this book has that title and looks like the book I remember having, and also looks like Knuth's books, so maybe I got confused and that's the book:

https://www.amazon.com/introduction-structures-applications-...

Here's some old email I wrote about it -- it turns out I did go on to write an interactive Forth window manager (for X10, with pie menus, and you could "fling" windows and it would start a Forth task to bounce them around on the screen -- see hacks.f), which foreshadowed my later work with NeWS (or its original name SunDew, as Mitch called it, which he ported to the Atari ST).

https://donhopkins.com/home/archive/piemenu/uwm1/

https://donhopkins.com/home/archive/piemenu/uwm1/uwm.f

https://donhopkins.com/home/archive/piemenu/uwm1/fuwm-main.f

https://donhopkins.com/home/archive/piemenu/uwm1/hacks.f

I made a diagnostic test which memory mapped the black and white framebuffer and ran memory allocator tests in video memory, so I could watch them. But I was using a shitty random number generator, and it looked suspicious.

Mitch sent me a better one based on the Unix random number generator, but it was pretty shitty too: The low bit alternates between 0 and one, so if you use "RND 1 AND" to flip a coin you get 0 1 0 1 0 ... So Mitch warned me that "Try dividing by two to get rid of the bogus lsb."

After I got it to work, I used it as a screen saver, so people would think my Sun 3/160 was broken and would leave it alone.

Memory allocators are easy. Random number are hard!

To: Mitch Bradley <wmb@sun.com> From: Don Hopkins <don@brillig.umd.edu> Date: Mon, Jul 28, 1986, 6:16 AM Subject: Dynamic storage manager

Here's a dynamic storage manager I wrote. It runs under cforth and the 68000 forth.

I'm sending two files: "records.f", and "dbuf.f". The first contains some words that are used to define records, like in pascal. The second is my dynamic storage manager.

I wrote an earlier version of this for my Apple ProDOS Forth system, as well as a set of string functions that used it. I'm going to port this much revised (and more portable) version back to the Apple, and update the string handling functions.

You may want to change the records.f file. I commented out some code in dopasrec that optomized field accessing words with zero offset, that was not completely portable.

I ran it through 100000 tests, and it worked fine. (I can't wait to port it to the Apple, and run the tests on the hires screen. Not as many, though. A Vax 8600 running cforth can do things a 1 MHZ 6502 is not allowed to think about.) Bet it would look cool run in the Sun screen memory.

Mike Gallaher and I will be arriving in Fornicalia around the evening of the 9th. We should be getting a rental car. When are you usualy around Sun and how do we find you? Mike's staying with Gosling. I may need a place to stay near Sun for a few days.

I'll bring a tape of interesting things. I've been doing some work with the X window system, enhancing the "uwm" window manager. Do you have a X interface for Forth written? If not, I'll write it. I would love to build an interactive forth window manager.

-Don

From: Mitch Bradley <wmb@sun.com> From: Don Hopkins <don@brillig.umd.edu> Date: Mon, Jul 28, 1986, 10:33 AM Subject: Storage allocator and stuff

Thanks for the code. I'll try it out within the next few days as time permits.

You can stay with us for 2 or 3 nights if that would help. My wife and I have a spare bedroom in our house.

Gosling knows where my office is. I'm usually in from 9:30 or 10:00 onwards. My leaving time is rather variable.

I haven't done much with X, because I've been doing some stuff with Gosling's new PostScript-based window system, SunDew. You'll see when you get here. X pales in comparison to SunDew.

I'd be happy for you to do any Forth interfaces that interest you. I haven't been doing much with Forth/Unix stuff lately, because I've been working on the Atari ST Forth.

Mitch

From: Mitch Bradley <wmb@sun.com> To: Don Hopkins <don@brillig.umd.edu> Date: Wed, Jul 30, 1986, 12:44 AM Subject: re: records.f

Here's how to make the "compile-time noop" stuff for the first field portable:

    : donothing ( -- )  does> drop  ;

    : dopasrec  ( -- ) does> @ create  inrecord @ if
                               walign over   ?dup if
                                  , dooffsetr   else
                            donothing immediate then
                                             +  else
                                          allot then ;
Now, here's how I do the same thing: The underlying semantics are similar, but I believe that my names are more in keeping with accepted Forth naming conventions.

    : struct 0 ;
    : field:  ( offset size -- offset' )
       create over , +
       does> @ +
    ;
    : byte:   /c field: ;
    : word:   /w field: ;
    : long:   /l field: ;
    : normal: /n field: ;
    : union:  0 field: ;
    : unused  +  ;

    : size:  constant  ;
    : ;struct drop ;

    \ Example:

    struct
       byte: >b1
       byte: >b2
       word: >w1
       long: >l1
    size: /foo

    struct
       byte: >c1
       7 unused
       /foo field: >foo
    ;struct
Note the use of ">" instead of "." to introduce field names. This is consistent with the conventional usage of ">" to denote the conversion of one address to another address (as in >NAME , >BODY , etc.). "." is a bad choice because it usually indicates printing or display.

A variation on this, which I hope to test one of these days, is to give the "struct" word a DOES> clause, which is used as the run-time action of the fields. This would allow you to easily define structures based at particular addresses, without having to explicitly mention the base address before each structure member.

Mitch

From: Don Hopkins <don@brillig.umd.edu> To: Mitch Bradley <wmb@sun.com> Date: Thu, 31 Jul 86 13:44:55 EDT Subject: Random number generator

The RND word included in the dbuf.f file is evil and should be flushed. I ran the test in dbuf.f in stand alone mode in the video memory of a Sun-3. I noticed that many of the buffers were just sitting there, never getting picked to be freed. I got pissed at the RND function and wrote a studly overkill, which gives much more pleasing results on the screen. I modified the test to fill each allocated block with a random nonzero byte, and to erase memory before freeing it. You can really tell what's going on. The Sun-3 is a FAST machine!

I did some tests with random bit-mashing ones I made up off the top of my head, and with the one in the 68000 forth (the same as the one you sent me), and was pretty unsuccessful in terms of non-repetitiveness, so I wrote this studly wastefull overkill version that gives pleasing effects.

    create rnd-state  255 /n + allot

    : pretty-fucking-random-as-far-as-I-can-tell-without-lots-of-heavy-theoretical-anaylsis ( --- n )
      rnd  dup 255 and rnd-state +
      dup c@ rnd-state + c@ seed +! 
      rnd swap +! ;
I wrote an Apple ProDOS Forth system, based on the FIG 6502 Forth. I'd like to adapt it to be metacompilable. This will make it a lot easier to port some of the more modern features that your system has. I'm writing a ProDOS interface for your file system right now in another emacs window. I want to add devices that know about reading other operating system formats, and other partitions on the hard disk.

-Don

From: Mitch Bradley <wmb@sun.com> To: Don Hopkins <don@brillig.umd.edu> Date: 31 Jul 86 10:34:13 PDT (Thu) Subject: Re: Random number generator

Try this one and let me know how it works. I copied this one out of the Unix library. Try dividing by two to get rid of the bogus lsb.

    \ 31-bit random number generator.  Uses a linear congruential generator.
    \ The result is a positive number in the range 0 <= n < 2^31
    \ The lsb of this number is no good.  It always toggles.
    \ The rest of the bits are pretty good.
    decimal
    lvariable seed
    : random  ( -- l )
       seed l@
       1103515245 *  12345 +
       th 7fff.ffff land
       ldup seed l!
    ;
From: Don Hopkins <don@brillig.umd.edu> To: Mitch Bradley <wmb@sun.com> Date: Fri, 30 Jan 87 17:15:25 EST Subject: multitasking and memory allocation

I'm having some problems with the multitasking package with forth. I've linked in a lot of C code, and that's using the unix malloc, and forth is using its own simple malloc, for multitasking. I'm afraid there are some weird interactions between multitasking and the linked in C code. Do you have some time for a phone call so I can describe what's going on? I'm wondering if forth should be using the unix malloc, but the details of the interactions between linking in C code, multitasking, how, when, and to what symbols are resolved, and the phase of the moon, have me quite confused as to what to do to make the damn machine Do What I Mean. I'd like to at lease replace the simple malloc with the nice malloc I wrote, but I think it would be a real pain to get the C programs to know about that. It seems it's easier to call C from forth than forth from C. Have you got any good tricks for doing the latter? Also, when you link in two programs that use the same unix libraries, do they each have their own copy? (i.e. two competing mallocs?!) If not, how in the world does it work? (What sort of monkey wrench do shared libraries throw into the works, and is a monkey wrench just the tool needed or what?)

-Do


Oh wow the covers are really similar! Now it totally makes sense that one would remember one for the other; compare the image of Tremblay–Sorenson on Amazon (https://images-na.ssl-images-amazon.com/images/I/31i7I+eER8L...) with that of TAOCP (https://i.ebayimg.com/images/g/ZKoAAOSwfzZgPA~D/s-l640.jpg or https://www.oldbookshopofbordentown.com/pictures/WE24020.jpg...) — sure the colour is technically different when you look at them side-by-side, but with many similarities in the jacket like the font with the small uppercase letters, there's a huge overlap.

(Thanks for that interesting email thread too.)


This reminds me of this video on a text processing example with Forth: https://www.youtube.com/watch?v=mvrE2ZGe-rs


Forth on Atari w/o mentioning https://atari8bit.net/db/?d=cd&g=Books&f=FORTH-on-the-Atari-... Yep, those were the times.

FORTH is all fun and such (well, the stack counting gets old after a while), but I have difficulties seeing it in larger, multi-person projects (were there any?). A good idea who's time came and went.


I associate Forth with astronomy.


There is a list of Forth projects used in NASA missions.

https://www.forth.com/resources/space-applications/


Don't you mean Fortran?


Surprise. Fortran was used as well but Forth was used quite a bit by NASA. In some ways it is extremely auditable. A handful of assembler functions sit at the bottom the rest is Forth. Minimal numbers of variables in memory if code is written to use the data stack for inputs and outputs to functions. Much easier to certify than GCC. :-)

There were a number of Shuttle experiments where the control software was written in Forth. It gives you a control interpreter out of the box and also bit level access to hardware so you can write custom drivers and then script the hardware control in the functions you wrote or compile them where speed is required, using an 8K..12K kernel.

The Cassini probe used Forth software running on a Forth 2-stack CPU.


No, and I didn't mean spaceflight either. Which Forth was apparently used for but I didn't recall until I googled it.

I associate Fortran with nuclear reactor engineers. I was told they were some of the earliest users of it, maybe even before it was out of "beta", so to speak.

Quote from 1955:

"Considerable effort is now being made by various groups to develop automatic coding techniques which would make the use of a large machine feasible on this kind of problem. The system of greatest interest is Fortran, which is being set up by IBM for the 704; there is some chance that it will be available by the end of the year. To use it, the engineer who is setting up the problem must formulate it in a somewhat modified algebraic notation, which is then entered directly in the machine. As a result, the task of stating detailed machine orders and assigning addresses, which is largely a clerical task and constitutes most of the work of coding, is completely eliminated. Some scepticism has been expressed by experienced programmers, on the grounds that the resulting code would be far less efficient than one designed by hand. Even if this is the case, however, a system of this kind will still have a wide range of applicability in reactor analysis and should be kept in mind in considering future machine work."


Forth was originally thought of as a "fourth generation" language when Chuck Moore designed it while doing astronomy research.


W. Richard Stevens wrote an early primer on Forth for Kitt Peak.

http://forth.org/tutorials.html


Two ways to "scale". Up or down. FORTH scales down.

Interesting how developers today impicitly assume the term "scale" mean "scale up".


"scale up" is usually used as a counter-argument to a solution that is perfectly fitted to the scale it is supposed to be used. This is nothing but a salesmen argument.

It is not only in the Forth community that one hears disaster stories like: we did the software for client X in 3 weeks with a team of 4 programmers, but after a few decades they lost the know-how of our language so they decided to rewrite in a "modern" language that "can scale". They hired a team of 40 developers, it took 3 years, and the result is less reliable than the original. The only thing that scaled up were the development and maintenance costs and the only thing that was modernized is HR policy (i.e. let's hire disposable developers).


I hope I'm not breaking any rules by saying that I enjoyed your blog immensely over the years


Weren’t some 7800 games written in Forth?


Looks like a few were: https://atariage.com/forums/topic/155951-games-written-in-fo...

These were written by third parties and published by Atari. Atari did not have any game programmers on the payroll at that point (all the software folks were working on the ST and related things). There might have been some contracting going on, but the days of an Atari employee coming up with a new game idea and shipping a new cartridge were gone.


Worms? was a FORTH jam, or at least that's what David Maynard said in a recent interview (since it seems this thread is becoming a list of FORTH games.)


Yes, the source is available here: https://github.com/savetz/worms


Seems painful. I liked this approach better:

https://github.com/nineties/planckforth

You start with a very tiny address or bytecode interpreter, much smaller than 512 bytes. Then you load several levels of bootstrap interpreters into it until you have a fairly featureful Forth. So that occupies ram but not program space on the target computer. You would load the non-initial stuff from a remote computer that didn't have tiny memory constraints.


I find it amusing how often trying to do complex tasks with simple constraints, always ends up back at stack based languages. Probably why the stack based turing machine is academically important.


> Probably why the stack based turing machine is academically important.

A Turing machine is not stack-based, but tape-based. A pushdown automaton (PDA) is stack-based.


Although turing machine tape emulatable via 2 stacks


Two stacks make an automaton Turing Complete. Which is an interesting result.


I suppose that prior art influences things. In my case, I went "a separate text editor won't fit, I need a REPL", and that basically filtered to just Forth and Lisp. It would be interesting to see what subset of Lisp would fit in a bootsector, but I wouldn't get my hopes up.


There's sectorlisp that appears to fit into two sectors but is yet shy of their single-sector goal: https://github.com/jart/sectorlisp


sectorlisp right now is 800 bytes but it implements LISP well enough that you can use its built-in LISP IDE to implement your own LISP engine on top of sectorlisp using high-level functional programming techniques. https://youtu.be/hvTHZ6E0Abo If we can trim down the code size another 250 bytes then all this will fit in the master boot record.


"To compile a kernel, you need a running kernel."

Why? I guess it depends on how you define "kernel." But to be able to do a batch process like compilation, you can get away with a little.


I'm guessing by kernel he meant any program that can host other programs in some way.

Can you clarify what batch process means in this case?

I /think/ the authors definition of a kernel is a program that can host other programs in some way.

A video game on a boot sector can't do that (even though it likely would have it's own kernel to function), but some kind of input loop to write memory could.


The old Apple II+ I had as a kid only did one process at a time. For an OS, it basically had what amounted to a REPL for Microsoft BASIC.

You don't need to host another program to compile something. All you need to do is to read data off disk, then write other data back.


And what would do that “read data off disk, then write other data back”, if not a program? Even if you use a monitor (https://en.wikipedia.org/wiki/Machine_code_monitor) to write a program, that’s still a program.

The only way to avoid that is to hand-punch a tape or something like that (with lots of practice and patience, it may be possible to program a small PROM or write an early floppy by hand)


I'm reminded of the manually-inputted Microsoft BASIC bootloader for the ALTAIR 8800: https://just8bits.blogspot.com/2017/03/doing-it-in-less-than...


I've compiled some simple C (microcontroller blinky) by hand. Then assembled and linked the output, and written the machine code into a HEX (Intel format) file. Then tested it on the micro. It's a rather fun exercise (Type 2 fun).


What is type 2 fun?


Fun with extra twists and turns you weren't expecting. Usually quite an adventure, some unintended learning and growth. Hopefully a story worth telling.

As opposed to the obvious doing an activity for fun.


Fun after the fact.

type 1 is fun at the time


On a Commodore 64, the kernal [sic] was similar to the BIOS on a PC, with display and keyboard handlers, IO routines, etc. When you first turned on the machine, something had to say "** COMMODORE 64 BASIC V2 **", and that something was the kernal.


I always kind of thought the kernel was, at its core, the thing that handles processes and hardware io. Even if you use one of those tiny micro kernels it has to handle processes and present an api through which to communicate with devices (even if it doesn’t handle the devices itself.) I think it’s true you need a kernel to compile a kernel in this sense.


That's precisely the joke. "Kernel" here means the minimal feature set that let you compile a kernel. The sentence is tautological, except perhaps for the definition of "minimal", because there are various flavors of "minimal" (e.g. absolute minimum, practical minimum, ergonomic minimum...).


As the article demonstrates, all you really need is a boot loader.

On early machines (large and small) with front panel hardware, you could use switches to manually enter the boot loader into memory and then start it.


...and the EDSAC crew decided that that sort of thing was a mug's game as far back as 1949, so they added what amounts to a machine to do that for you (the Initial Orders unit). (No, I don't really miss the Altair, no matter how ecstatic I was about it in high school. Toogle-booting is fun maybe twice.)


Workarounds exist, like for every other example I mentioned. I am mostly referring to the commonly used dependency graph.


There must be lots of ole-school fixed-width machines, which have stack machine built-in. Especially if you decide that top-of-stack means contents of the accumulator(s). To make it a "forth", you need separate string space, where machine words are linked to symbols. 1+ for example is just INC.


Many modern RISC architectures have pre-increment and post-decrement addressing modes for all their general-purpose registers, so it's very easy and convenient to maintain multiple stacks and implement "NEXT".


Will this run in Bochs.


people talking about how cool FORTH is. @cblum made his own sector forth. this is my tribe. i honestly thought there were no stack fans left in the world, thanks for proving me wrong.


Check out Bitcoin’s script language sometime.


Postscript, the printer page description language, is a general purpose programming language in common use, also as the basis of PDF.


One has to implement FORTH oneself to truly grok it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: