Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Lisp with GC in 436 Bytes (justine.lol)
513 points by jart on Dec 20, 2021 | hide | past | favorite | 131 comments



Everytime I see this domain I know I'm in for a treat. I assume the Dev's name is Justine, and they're one hell of a programmer. Pretty sure Actually Portable Executables are from this dev as well, which are also incredibly interesting.


Author here. Thanks! The other projects I'm known for here on HN are:

https://news.ycombinator.com/item?id=26271117 Redbean (2011) is a single file distributable web server. Here's a video of me giving a talk about it https://youtu.be/1ZTRb-2DZGs

https://news.ycombinator.com/item?id=24256883 αcτµαlly pδrταblε εxεcµταblε (2020) lets you build c/c++/fortran/etc. code so that it runs on seven operating systems

https://news.ycombinator.com/item?id=13769727 Operation Rosehub (2017) where I helped save a lot of people from an Apache security bug similar to Log4j issue today

I like doing these projects because they all fit together so well into a consistent story. For example, a lot of these past projects were what helped to make SectorLISP better. Thanks to APE you now have a 20kb quickly deployable and embeddable LISP interpreter. Thanks to Redbean we're going to have an online pastebin service that'll let you publish SectorLISP gists in a few days, for example: https://lisp.pub/1

Thank you Hacker News community for being so supportive and encouraging.


Congratulations on all your progress especially creating the worlds tiniest programming language in the world with GC!


Hey, i have a question regarding your projects! and before that, i am a huge fan of redbean, this is so cool! as long as the world has also people like you with that cool stuff in it, there's still some fun and hope in the world.

but for the question: for your projects, do you have like a stable buildplatform (like nixos, minimal debian stable or sth like that) for which you publish your build instructions for your project?


I used to do a lot of work on Bazel. I wrote its downloader code for example. https://github.com/bazelbuild/bazel/commit/ed7ced0018dc5c5eb... Bazel is nice, but these days I'm running a small scrappy operation, so I just use GNU Make. https://github.com/jart/cosmopolitan/blob/7064d736e3ded15087...

I like Make since it's able to build a repository with 17k .o files, 80 .a archives, and 661 .com executables from scratch in under a minute on one personal computer (if the kernel page cache is warm). I wrote a couple small helper commands to make the make config more manageable, like package.com, mkdeps.com, compile.com, ar.com, zipobj.com, and runit.com.

The reason why Make works for me, is because I think the root cause of needing things like Autoconf and Cmake is because most projects need to depend on seven different C libraries. I decided that, rather than focusing on writing a better build config, I'd rather use an unfancy build system and instead devote my energy towards having a single C library that runs on all seven of the platforms I'm targeting. I owe a lot of thanks to projects like musl, dlmalloc, dtoa, llvm, etc. from whom I borrowed source code. That enabled me to abstract portability at the libc level, rather than punting to #ifdefs and configs.


Beautifully executed. I aspire to solve my problems with such grace. It's difficult and i fail often but not without trying.


How do I get you interested in the Nix ecosystem? It's a great sandbox for all sorts of portability and bootstrapping experiments.


Awesome work, and very enjoyable to read, kudos!

> αcτµαlly pδrταblε εxεcµταblε

You mean ακτυαλλγ πορταβλε εχεκυταβλε?


Shouldn't it be: πορταμπλε (since β actually is a "v" sound so μπ is used for our "b")


Was going for visual resemblance, but certainly!


Very impressive.

> SectorLISP uses what we call an ABC garbage collector and it took only 40 bytes of assembly.

> [...]

> Fast immediate garbage collection with zero memory overhead and perfect heap defragmentation is as easy as ABC when your language guarantees data structures are acyclic.

Neat, but my understanding was that even pure functional languages can't generally make this guarantee, because things like letrec complicate matters. If SectorLISP can take this approach, why doesn't Haskell? (Or does it?)


Author here. Why would it? Let's have a look at Scheme letrec (or let* in Emacs Lisp) translated to pseudo-SectorLISP:

    (LETREC ((IS-ODD?  (QUOTE (LAMBDA (X) (CAR X))))
             (IS-EVEN? (QUOTE (LAMBDA (X) (NOT (IS-ODD X))))))
      (IS-EVEN? (QUOTE (NIL T NIL T))))
The oldskool LISP way of writing that is:

    ((LAMBDA (IS-ODD? IS-EVEN?)
       (IS-EVEN? (QUOTE (NIL T NIL T))))
     (QUOTE (LAMBDA (X) (CAR X)))
     (QUOTE (LAMBDA (X) (NOT (IS-ODD X)))))
As far as I can tell, letrec is just a nicer syntax for the same thing. So I don't know why it should require any mutability. It works probably because LISP is dynamically scoped. Most languages and modern LISP implementations prefer different scoping models to make the Assoc() equation faster. However that comes at a cost, since it can turn the difficulty of LISP up to 11. Next thing you know, you're discovering things like the Y combinator. So I have major respect for what languages like Haskell and Scheme are doing, because it's incredibly difficult. One of the reasons why I created SectorLISP is because I wanted to have a fun bastion of simplicity when I don't need the more mature tools.

SectorLISP doesn't have any kind of internal mutability last time I checked. But I have experimented with it. For example, the Peel() optimization described in the blog post is how we solve the "make Assoc() faster problem". As it's written in the blog post, it's 100% pure. But if you introduce a tiny bit of mutability, it can go even faster on benchmarks even though it doesn't change the complexity class. See https://justine.lol/sectorlisp2/fast.js


I think you’ve a trivial error in your function definitions but I don’t think it’s relevant.

I think you’re right about avoiding cycles. I think the parent was talking about a let rec construction for data, e.g. in Haskell:

  let funky_list = 1:2:3:4:funky_list
(Note that : is the cons operator).

If this is allowed then you may have issues. I think the problem is perhaps that the GC won’t work if you have pointers in the opposite of the normal direction. The fix though is relatively simple: add forwarding pointers (somehow), then after copying from the ‘B’ region to the ‘C’ region you leave behind a forwarding pointer and, if you see it again in your copying, you don’t make a duplicate copy of the data. Then I think you can proceed as before. But representing forwarding pointers efficiently is hard, I suspect.

Actually I think I don’t understand the GC sufficiently. Do you end up with a lot of memory usage after this or a little?

  ((LAMBDA (D Z) (D Z))
   (QUOTE (LAMBDA (X)
     (COND
      (X ((LAMBDA (Y) (CONS Y Y)) (D (CDR X))))
      ((QUOTE T) (QUOTE T)))))
   (QUOTE (A A A A A A A A)))
This should get you a binary tree about 8 levels deep with T on every leaf, and I think it should use linear memory (in the depth of the tree) but I don’t understand how the GC avoids ending up with an exponential amount of memory being used?


Ah yes I missed that that when posting my earlier comment.

As for your example, it uses exponential memory because your example produces exponential data. The only thing I'd need to show is that SectorLISP's ABC garbage collector doesn't grow at a large rate as the input size increases. According to the simulator's "heap" statistic, your example needs 449 doublewords of memory to compute a data structure that's 266 double words in size. That seems linear, but it's not, because if you add a few A's then the gap narrows.

The cool thing about your example is that even though it's generating exponential data, that data structure doesn't require an exponential amount of memory to be stored internally. Currently the SectorLISP simulator only does things like de-duplication for DEFINEs which the simulator reports as "code" size in doublewords. However that's a friendly branch feature and the rest of the time SectorLISP just does plain "what you see is what you get" evaluation. I'm sure much more advanced techniques have been discovered since McCarthy wrote his original paper.


To be clear, my understanding of the GC is that what I wrote would use a linear amount of storage before the gc and potentially an exponential amount after. But maybe the gc runs in more places than I’d realised and so it never gets to only be linear.


I've recorded a screencast of your algorithm running in Blinkenlights. That might help intuitively explain what the ABC Garbage Collector is doing: https://justine.lol/sectorlisp2/binarytree2.mp4


> I think the parent was talking about a let rec construction for data

You know more Haskell than I do ;-)

What I had in mind were recursive and mutually recursive functions, defined with letrec, as examples of reference-cycles in functional programming.


> a let rec construction for data

Is there actually any Lisp dialect where letrec works for data? I'm not aware of such, at least where it comes to major ones (CL and common implementations of Scheme).


It would need a lazy data constructor and the lisp ones are (usually) strict. If you make cons et al lazy then letrec on data works as it does in other languages. Alternatively, make letrec wrap the values in thunks. The resulting cyclic structures are then moderately annoying to work with.


Oh, I forgot lazy Lisps. But those would probably not be considered "major".

> Alternatively, make letrec wrap the values in thunks.

Aaaand of course I forgot that as well. My bad. So basically full lexical scoping and the presence of lambda should be enough to create cyclic structures, even in the absence of mutability.


Common Lisp let’s you write e.g.

  (list-length '#1=(1 2 3 4 . #1#))
But I think that isn’t what you mean.


I don’t understand how to construct cyclic references using only letrec. Do you have a concrete example?


LETREC requires mutability in the presence of lexical scope. Otherwise it works fine in the way you describe it. Using lexical scope, IS-ODD and IS-EVEN would be bound in parallel and (LAMBDA (X) (NOT (IS-ODD X))) would close over IS-ODD before it is bound, so it would be undefined when applied inside of the outer LAMBDA.

Of course dynamic scoping will bite you earlier or later because of the upward and downward FUNARG problems. See my book (http://t3x.org/lfn/) for lots of entertaining details.


Letrec requires either laziness (which implies some sort of internal mutability, as thunks get replaced by values at some point), or "internal" mutability that may not be visible to the programmer (like in OCaml, where in a `let rec` some variables will initially be null pointers, and get updated once other definitions have run).

In languages with no mutability at all (not even internal mutability like with these two features), you can't create cycles and refcounting becomes possible, but I'm not aware of non-academic general purpose languages that actually fall in this category.

Esoteric functional languages like Unlambda do make that guarantee, and implementations can use a ref counter as their GC.

I've written more about this here: https://terbium.io/2019/09/cyclic-immutable/


If data is immutable and you tie the lifetime of data to a single stack frame, you only need a single bit for reference counting: whether the current frame owns the reference. Calling a function just requires setting this to bit low (some asterisks here), and all objects tied to a given frame can be collected when the frame returns.


> I'm not aware of non-academic general purpose languages that actually fall in this category.

Erlang?


Erlang does have some mutability (the process dictionary, and the mailbox), but the fact that the language is largely immutable is leveraged by the GC: if I remember correctly the private heap GC is a generational semi-space copying GC, but because the language is immutable it doesn’t need to have write barriers, to perform a minor collection only a scan of the stack(and process dict) is necessary to know the valid roots of the young heap.


I believe Erlang makes this guarantee and exploits it in the GC:

...the terms are immutable and thus there can be no pointers modified on the old heap to point to the young heap

so old objects cannot refer to young objects. This means it doesn't have to deal with write barriers, etc.


Strict purely functional languages will sometimes provably introduce log n slowdown that can't be amortized away. This is certainly a trade-off.

This doesn't happen when you've got mutation or laziness.

One language that I'm aware which makes this tradeoff is Erlang.


When you say log n slowdown, what is n here? The length of the program? Or something completely different?


I think the theme is ‘you can’t do arrays; the best approximation is made of trees, hence log n in the size of the array vs 1’.


Though I think Clojure has a linear datastructure that's O(log32(n)) for all the common ops. (This is the one with all the memeing because someone called it "practically O(1)".)


Right, and updates are also O(32 log32(n)) rather than O(2log2(n)), rather than O(1) for arrays. It’s a bit more complicated when you think about cache lines being updated but mutable arrays massively win on cache


Why do you have a constant before the log term? Isn’t that completely meaningless?


It is meaningless mathematically. It’s as meaningless as the parent talking about log32(n) because of course log32(n) = log n / log 32, so there is just a constant of 1 / log 32 being put at the front.

I think the subtext is really a statement about constant factors. The parent is saying “if trees are wide then lookups are cheap” and I am saying “if trees are wide then updates may be expensive” (to update an immutable binary tree of size n, you must allocate log2(n) nodes of size (say) 3. To update a 32-ary tree you need log2(n) / log2(32) = log2(n) / 5 nodes of size 33 so, as 33/5 = 6.6 > 3, the wider tree requires more allocation for updates.)


malloc isn't typically O(n) though, so I don't think that's reasonable math?


If you are talking about Pippenger-Fischer theorem, then that's about Turing machines, not machines with RAM. Or is there another theorem I don't know about?


It's a separate theorem. I'm not quite sure where I've seen it before, but https://www.cs.princeton.edu/courses/archive/fall03/cs528/ha... seems to the right thing. Generally I've also heard it mentioned in the context of Okasaki's book about purely functional data structures.


Elm is also strict and purely functional.


elm does not have its own runtime so this discussion doesn’t really apply.


Ah, I wasn't referring to the point about garbage collection further up the thread. I was just noting that Elm is another example of a strict purely functional language (and so another example of a language that gets purity in exchange for losing access to certain O(n) algorithms).



That page is incorrect about GHC's memory management -- I verified this with a question to haskell-cafe around 2014, and I guess no one got around to updating it.

The linked paper at the bottom of the article explains how things work and have worked for a long time now. The source of cyclic references, and even references from old to new generations, comes from "thunks" (unevaluated lazy computations). Haskell guarantees that a thunk only ever gets evaluated once, and the way this is implemented is that the thunk is mutated with a pointer to the evaluation result.

So the garbage collector needs to deal with mutations, cyclic references, and references from old to new generations. (All that in a language with supposedly immutable data!)


I'm pretty sure that Haskell supports cyclic data structures and that GHC garbage collection is more complicated than that. See for example the presentation by Ben Gamari over here: https://www.youtube.com/watch?v=EQqBpVRA3wU. Haskell GC is a generational copying design. Optionally you can enable the low-latency GC in newer runtimes, which is still generational and copying for the youngest generation but uses a parallel mark-and-sweep algorithm for the later generations.


Potentially cyclic values are statically known

But yes, GCs have evolved in the past decade, so these things are subject to change


I mean even a doubly-linked list is a cyclic data structure...


Sure, but presumably they can't be expressed in SectorLISP.


Past related thread:

Show HN: SectorLISP now fits in one sector - https://news.ycombinator.com/item?id=29047584 - Oct 2021 (75 comments)


This is such an awesome project! Writing interpreters in assembly is something of a lost art. That's part of the reason I wrote a Wasm interpreter in asm. I hope this does not stay a lost art.


Convert his into a wasm lisp perhaps? Just a suggestion. Both these attempts are crazily great.


>It left plenty of room to add a 40 byte garbage collector.

Oh yeah. Who wouldn't splurge if you have a massive 40 bytes left. Beautiful.


> BrainF#*k not a true language

Curious why that might be, and TFA doesn't expound. What makes it not a "true" language?


Author here. You know it when you see it. LISP is a tool that people want to use. People write production software using it. For example, this website (Hacker News) was written in LISP. Languages like BrainFuck are usually only called programming languages if the word "esoteric" is attached. There's a real consensus of opinion around this this being a generally held view.

There should ideally be a more compelling explanation of why Brainfuck is an esoteric toy and LISP isn't, than me simply saying "but that's what people believe". I considered going into more depth on this subject. I found a C to Brainfuck compiler written in OCaml and used it to compile a Brainfuck metacircular evaluator into Brainfuck. It ended up being 88,439 bytes of code, and it sadly didn't work, because the compiler isn't very polished.

Let's compare ratios. Brainfuck is 99 bytes, whereas Brainfuck written in Brainfuck is 88439 bytes. That's 893x larger. On the other hand, LISP is 436 bytes of systems code and LISP written in LISP is 1370 bytes of LISP. That's 3.14x larger. It's cool that the ratio between layers of simulation with LISP is pretty close to pi. It helps explain why LISP is viewed by human beings as being such a powerful tool, since it lets you accomplish more impact with less effort, without having the castles you build turn into a leaning tower of pisa. Since after all, the purpose of tools since prometheus has been to give people more power.


Would you consider binary lambda calculus [1] [2] a real language? The 650 byte interpreter written in obfuscated C also sports garbage collection and might fit in a bootsector when translated to x86 code. Its metacircular interpreter is only 29 bytes (232 bits), and looks somewhat readable with some syntactic sugar [4] (analogous to how x86 looks more readable as assembler source).

[1] https://www.ioccc.org/2012/tromp/hint.html

[2] https://tromp.github.io/cl/Binary_lambda_calculus.html

[3] https://github.com/tromp/AIT/blob/master/int.lam


But does either include a JIT? Can they JIT themselves?


Counter argument: ever seen the proof for simple addition in English? The verbosity and/or opacity of the definition of a system isn't a great proxy for value or ease of use. Principia Mathematica uses over 1000 pages to reach the proof that 1+1=2, formally defining and explaining everything needed along the way, establishing fundamentals and primitive types and so on.

Brainfuck isn't a great interpreter language. It is a great perspective language. If you can break your tasks down sufficiently that you can express your program in brainfuck, you can take that understanding back to a more performant and expressive language. Doing Rosetta code exercises in esoteric languages forces your brain to model programs with more nuance and detail. If raw machine level performance was the only thing that mattered, we'd be writing everything in assembly. Different languages can have different utility, though, and while brainfuck is less useful as software, it is an excellent tool, not just a toy.


Isn't the distinction as simple as "does the language let you define anything and use recursion?", where I'm going to be a bit handwavy with "define anything" but I think we all get what I mean.

You can't name functions in Brainfuck or recursively call them. You can do so in LISP and Forth. That is also the entire reason Brainfuck explodes in size: it lacks the "semiotic" capabilities required to compress source code (and therefore, structure and refactor code, which is what humans need).


To me it's missing the point to describe that as Brainfuck versus Lisp. It just turns out that lambda functions and linked lists are higher level abstractions than arrays and pointers. I'd like to see some more variations on that concept and what the size is if using another esolang that has lambda functions.


I had this argument once -- I also believe Brainfuck is more of a machine than a language. Whereas Lisp is a language.

The main reason is that Brainfuck lacks composition, so what happens in practice is that people GENERATE Brainfuck code rather than writing it. When you write code, you rely heavily on your programming language's mechanisms for composition in order to make something that works.

Someone pointed me to people who DO actually program by hand in Brainfuck. There are some pretty big programs that are hand-written, not generated. Still I'm not entirely convinced, as they seem to be have been constructed through mental feats, not compositional language.

It's a curious fact that human languages and programming languages compose in a way that can be leveraged to build very large things that "work". I guess there is some equivalence between building up a large body of knowledge through language and making a program that works. Brainfuck doesn't seem to have this property.

The demonstrations of arithmetic and calculus in the post are nice examples of this in Lisp, and from there you have the rest of the world as your disposal (which is well documented in many books). With Brainfuck you're left crawling on your hands and knees, occasionally finding something that works, so to speak.


Brainfuck, like assembly and forth, composes by concatenation.


Data structures have to compose too


From the beginning of SICP [0]:

> A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:

> * primitive expressions, which represent the simplest entities the language is concerned with,

> * means of combination, by which compound elements are built from simpler ones, and

> * means of abstraction, by which compound elements can be named and manipulated as units.

Brainfuck is missing any sort of means of abstraction. The primitives are instructions, and the only means of combination is writing instructions in sequence, but there's no way in the language to refer to any sequence of instructions --- no functions, no subroutines, not even labels. This makes it utterly useless as a way to organize your thoughts on how to compute something.

[0] http://sarabander.github.io/sicp/html/1_002e1.xhtml#g_t1_002...


It's almost the opposite of a language, in that it's probably harder to do anything with than just using whatever the native assembly instructions of the machine are. It's explicitly designed to be annoying, an example of a Turing tarpit:

https://en.wikipedia.org/wiki/Turing_tarpit


I would have to assume because it's neither a high-level language, nor a practical low-level one. True in the sense of a language being a labor-saving device.


Ok, that matches my understanding of it then - it is a language, just not one you'd actually want to do anything in. I was thinking that the author was saying it wasn't a programming language in the sense that HTML is not a programming language.


My "not so gracious" interpretation: "It would invalidate the claim that we wish to make, so... no true scotsman"


Author here. We anticipated this as a possible criticism of our "we built the world's tiniest programming language" claim. That's why we wrote the Brainfuck interpreter too. As far as I know, the 99 byte Brainfuck interpreter that's included in this blog post is the tiniest one to date. So even if that invalidates our tiny LISP victory, it'd still be a victory for the team since we built the competition too.


Lisp is a language that people actually use to get work done; Brainfuck isn't.


Well… I believe it was James Gosling who said that there are two types of computer languages: “ones people complain about and ones nobody uses”. So if you’re complaining about Brainfuck then it must be a language people use by process of elimination ; )


Stroustrup, not Gosling. And I like your logic. :)


This presumes that "ones people complain about" and "ones nobody uses" are mutually exclusive - and I'm reasonably sure that Brainfuck is clear evidence against such mutual exclusivity ;)


Unless your "work" is proving that a language is Turing complete. That is, to my knowledge, the only serious use of brainfuck.


I suppose the humor was lost in my message.


Impressive. Yet another disappointing reminder that, as a programmer, I'm not worth the salt of the food I eat daily.


Dude, this thing is a toy of sorts and the author spent quite a bit of time on this. No reason to knock yourself down.


As someone who’s spent their career gluing react libraries together, something like this does make me wonder how good some people really are with computers. It’s all the same assembly instructions in the end


Sometimes that kind of work has a lot of impact. For example, people at Google used to joke around about how they hire PhDs to write HTML, but it was important HTML. Just like I'm sure the React libraries you're gluing together are being used for important things. The work I'm doing right now is flashier, but I had to make a lot of personal sacrifices in order to be able to do it.


I think thats the rub these days. With such an impressive mountain of free and indeed proprietary software, the money is going to be mostly in how you glue it together to solve a problem for a business. Getting something in a low number of bytes is not rewarded when gigabytes of storage are so cheap.

To create something that doesn’t fir this mould has to be more like a “recurse centre” project - a labour of love you have to finance yourself.


> storage is cheap

Sometimes. Sometimes not.

The physical HW, i.e. RAM and HDD storage is cheap but price you pay for accidental complexity is high. E.g. when your non-tail-call recursive calculation eats away all available memory. And you need to persistently store the intermediary results - just a handful of long integers in a dbase. And that dbase must be set up and maintained. That means dealing with access rights, usernames, passwords, replication, backups etc. In every environment: prod, tests, on every development branch (separately!), etc.

In another dimension, read / write operations mean yet more side effects to deal with. And on top of that, dealing with intermediary results means yet more accidental complexity in the source code.

So no, the workaround for "the memory limitation problem" can be in fact the most expensive part of a project. And now imagine, the data grows day by day...


There quite a few outliers and 10x programmers when you consider that there are millions of programmers. most of us are somewhere in those two quartiles in the middle. Be happy with what you can do and always try to improve don't try to compare yourself to the Einsteins of the world, you'll always be disappointed. be happy that you're good enough to be a professional but always be a little bit restless.


I have sent this article to some folks and plenty of them had the same reaction. A bit like math envy, it is normal to have when you lack a formal education and all you did professionally was writing CRUD apps. It does _not_ mean your work is in any way less meaningful though.

And yeah it is indeed impressive and maybe the coolest thing I've seen this year.


Overlapping functions... (saves 14 bytes...)

Crazy, DNA uses this trick!

"-Os" should search for possibilities here..

I'm wondering if such code breaks any CPUs. I could imagine that it could confuse the code translation that modern x86s do. I mean I bet they don't test this particular corner case...


That's cool. I didn't know. It's nice how having fun hacking on boot sectors can help us discover things mother nature figured out a long time ago. Do you have links you could share where I could learn more about dna coding hacks? For example, I've been reading some of the stuff written by Bert Hubert (known for his work on PowerDNS) who got into biology recently and wrote blog posts like this https://berthub.eu/articles/posts/amazing-dna/

> These comments are fascinating in their own right. Like C comments they have a start marker, like /*, and a stop marker, like */. But they have some more structure. Remember that DNA is like a tape - the comments need to be snipped out physically! The start of a comment is almost always indicated by the letters ‘GT’, which thus corresponds to /*, the end is signaled by ‘AG’, which is then like */.

I don't know a whole lot about biology, but learning cool facts like that makes me wonder what would happen if those comment markers were used as parenthesis.


It is submissions like this that keep me coming back to HN.

Even though I understand little of the OP, I understand more than last time, and have the hope to be able understand more in the future.

Thanks for the inspiration OP!


Author here. I tend to learn best by doing, so I'd say the same thing. Each time I write one of these blog posts I feel like I understand LISP a bit more and it's my pleasure to share that knowledge. Thanks for your words of support!


It’s submissions like this that keep me coming back to HN 3 times per day, in order to see such a submission 3 times per quarter.


My great invention was that you dont have to collect all the garbage, only most of it. So I was assuming all the numbers on the CPU-stack were pointers to live cells. Never run out of memory because of this heresy.

    > (ncompile '(cons 1 (cons 2 3)))
    $09CB:$7163:   MOV  AX,$04 ; S-object 1
    $09CB:$7166:   PUSH AX
    $09CB:$7167:   MOV  AX,$05 ; S-object 2
    $09CB:$716A:   PUSH AX
    $09CB:$716B:   MOV  AX,$06 ; S-object 3
    $09CB:$716E:   MOV  BX,AX
    $09CB:$7170:   POP  AX
    $09CB:$7171:   CALL $05AE   ; cons
    $09CB:$7174:   MOV  BX,AX
    $09CB:$7176:   POP  AX
    $09CB:$7177:   CALL $05AE   ; cons
    $09CB:$717A:   JMP  $1DA7
      (subru: eval=$7163, compile=$3B6F)


One thing you can do is `push $05` which only needs 2 bytes. The idea of using the x86 stack instructions to construct cons cells is potentially brilliant. I experimented with it a little bit. Couldn't make it work. For example, here's how it changes certain aspects of the implementation:

    Cons:   xchg    %sp,%cx            # Cons(m:di,a:ax):ax
            push    %di
            push    %ax
            xchg    %sp,%cx
            mov     %cx,%di
            xchg    %di,%ax
            ret

    Apply:  ...
            xchg    %cx,%sp
    Pairlis:test    %di,%di           # Pairlis(X:di,Y:si,a:dx):ax
            jz      1f                # for x,y in zip(X,Y)
            push    (%bx,%di)         #      (- . y)
            push    (%bx,%si)         #      (x . y)
            push    %sp               #     ((x . y))
            push    %dx               #     ((x . y) . a)
            mov     %sp,%dx           # a = ((x . y) . a)
            mov     (%si),%si
            mov     (%di),%di
            jmp     Pairlis
    1:      xchg    %cx,%sp
            ...


My "compiler" was totally context-free and made in assembler. Which means that it compiles every instruction in complete vacuum and assumes stuff comes in AX and BX registers (with rest-pointer in CX, I think). And result in AX.

But this proved not to be a bad start at all. Once you understand the limitations of the "compiler", you can modify the macros accordingly. One of the feature of the compiler was that it assigned absolute memory places for variables, so you could stop wasting stack and do early assignments to temporary variables.

Unfortunately the source is quite incomprehensible now because of insane use of nested macros: https://github.com/timonoko/nokolisp

But the example given above works, no doubt about it:

    > (setq test (ncompile '(cons 1 (cons 2 3))))
    > (test)
     
    (1 2 . 3)


Wow, when did the C-JavaScript polyglot thing happen? I don't recall that, and looking back on the October article, it doesn't mention that. What's the motivation for it?

https://news.ycombinator.com/item?id=29047584

I would expect that to add a constraint that makes the code bigger, not smaller? Or I'm probably misunderstanding how the different variants of the interpreter work.


SectorLISP has always had a C implementation for explainability. It started off as ugly C because it was actually used to generate the assembly code for the first ~900 byte version. See https://github.com/jart/sectorlisp/blob/a561e031aec03270459f... and https://github.com/jart/sectorlisp/blob/a561e031aec03270459f... Once we reached 512 bytes I started deleting a lot of the C code since things like assembly macros weren't needed anymore, since the assembly was now being written by hand. https://github.com/jart/sectorlisp/blob/main/lisp.c

Once I cleaned up the C code, I noticed that the entire program didn't use pointers at all! (Except of course to interop with Bestline, but that could be replaced with fgetwc() instead). That's when the idea occurred to me that, since it didn't use pointers, it was also technically valid JavaScript too. So I asked around on Twitter to see if anyone's done a C / JS polyglot before. I got some helpful tips from a code golfer in Estonia who experimented with the idea and he told me about the paragraph separator trick. https://twitter.com/angealbertini/status/1463755612345540611 So I updated the C code to hybridize the two: https://justine.lol/sectorlisp2/lisp.js It's a shell script and a C program and it runs in the browser unaltered without needing to be compiled.


For the sake of just-for-lulz purity, you could #define var auto instead, and then it wouldn't have any uses of int in the code at all.


Wow. I didn't notice that. A real C program without types, pointers, or macros. Who'd've thought such a thing would be possible!


It kinda makes sense when you remember that implicit int is there in K&R C for compatibility with B (which only had one type, the machine word).

B might be an interesting choice of language for an uber-minimalist compiler, by the way.


This blew my mind.


This inspires me to craft a language just for fun.

I absolutely LOVE projects that immediately build on themselves. I wrote a GameBoy emulator from scratch and without any prerequisite knowledge and it felt amazing because of how incremental it is: you can break it into thousands of sensibly sized pieces.

I’m looking for the next project that’s like that. At the moment it’s an ECS video game.

I think work is what makes me seek projects with really good feedback loops.


I love it, but i think the video from "the oldskool pc" is a bit inprecise in claiming it brings everything necessary to implement features like reading off the disk: now maybe I have overlooked something but how are you going to put something into eax and then do int 80h? the only interupts in the codebase are 10 and 16, so i assume it is a challenge for the reader to implement anything that has to do with hardware, though a universal api to do interrupts would be nice in the core language


I think the Oldskool PC channel did a good job covering the topic. He started writing his own boot sector operating systems from scratch to make that video. When you do that, you start seeing something like SectorLISP more as a kit that lets you take these beautiful universal equations that John McCarthy discovered, and then you fill in the blanks yourself in terms of systems implementation details. SectorLISP isn't intended to be a shrink-wrapped solution (and The Oldskool PC channel has excellent videos about shrink-wrap too!) So I do think the video is accurate and I'm very pleased with how it turned out.


as i said i love it, its just, when he said everything was ready, i thought the function to trigger whichever interrupt you want was included in the 436 bytes, at least there is plenty room to implement something, again: i love it

edit: there is always mr brown to help us out : http://ctyme.com/rbrown.htm


[deleted]


Author here. Yes, but you have to use the extended 509 byte "friendly branch" version, in order to get error messages on undefined behavior. For example, if I mistype a variable name as "FUDGE" then it'll longjmp() out of the evaluator and print "?FUDGE": https://justine.lol/sectorlisp2/debugging.png

If we were to do that in the main branch, it'd get caught in an infinite loop. You can always define a higher-level LISP evaluator on top of it which has undefined variable reporting. So that doesn't make it a toy. Doing nothing on undefined behavior may be hostile and dangerous, but it's legal as far as optimizations are concerned. If what you said were true, then Clang and GCC would be toys, since they'll often exploit undefined behaviors to gain an edge in benchmarks, by generating code so that the world might end if an integer overflows.

So one way you could think of it is the 436 byte version is your release branch and the 509 byte version is your dev branch. Both of which fit inside the 512 byte boot sector!


The short eval is standard amongst small lisps (see eg SICP), but the immutability which leads to the nice ABC GC (no RPLACD, no cycles, immutable strings) is practical. All other lisp liked one or our autolisp features: the very same immutability. Such a GC really is trivial and fast. Lot of temporary garbage, but only for a single op.


Can't you define FORTH (or a subset of it) for some architectures purely in assembly, then use those functions like normal? I remember some file floating around on GitHub that showed how to do that.

Anyway, amazing work as always!


That's how Forth is normally implemented (and what makes it so handy on a new platform that doesn't have any dev tools yet). Although there's a further difference between implementations that rely on an existing assembler to implement the native bits, and those that roll their own in a way that allows them to cross-compile themselves.

As for the file, I suspect that you're referring to JonesForth, which is a heavily commented implementation of the first variety:

https://github.com/nornagon/jonesforth/blob/master/jonesfort...


Yes; this Lisp is entirely in machine code and smaller than SectorForth.


> It's compatible with all PC models dating back to 1981 which have at least 64kb of RAM

So, is it compatible with C51 8080?

The differentiation function sounds really nifty, too!


this is awesome exposition (from someone who has implemented a small version of Scheme himself decades ago)


I wonder if there are botnet/rootkits circulating that use such compact programming.


Vintage computer viruses are definitely in this category of size optimisation. Not so much "modern" malware.


I’m ignorant to such optimizations—-is the idea just to be small enough that your presence isn’t noticed stuffed into a exe/pdf file or are there other reasons a virus designer would want to create such small executables?


Yes, that's correct. Being able to infect without increasing the length of infected files is an important point. https://en.wikipedia.org/wiki/Code_cave


Some had to fit in a 512 byte bootsector. Size optimization was always a concern when dealing with floppy disks.


When is a Lisp a Lisp? I mean what functionality needs to be provided to call it such?


Paul Graham's Roots of Lisp is a pretty good introduction to the fundamentals http://www.paulgraham.com/rootsoflisp.html

(The actual document is a .ps at the bottom of the page)


Thanks. I need some time to parse this. So this 436 bytes lisp implemented the 7 commands, so one can write the eval function?


More or less. Read-eval-apply loop is the core concept of running a Lisp program. Structure And Interpretation Of Computer Programs is an excellent reference on the detail on how to put together such a system and how to use it.


Serious question: who uses Lisp these days? Been in software dev industry for over 20 years. Never encountered a single project employing the language.

There must be some reason Lisp is so popular on HN.


The author of this site and the founder of YC wrote Lisp books, and then a bunch of influential essays about Lisp (starting ~20 years ago):

http://www.paulgraham.com/lisp.html

This site is written in Arc, his first Lisp: https://news.ycombinator.com/item?id=28155134


Clojure is fairly popular on the JVM and the site you are on uses a lisp dialect named Arc.

Though not popular as an industrial language, ideas from lisp have spread into javascript and front end dev


- Professionally, much of my output is in Python--everyone else in the shop knows it--with a smattering of C when it's warranted. That said, I'm an Emacs user, and I do write libraries and custom modes in Elisp to do things I find useful. In the hopes that others will, too, I make them available to my colleagues.

- I use Common Lisp at home for personal projects when shell scripts no longer cut it. When the Python 3 storm hit my ~/bin, it occurred to me that one of CL's best features is that its specification has been stable for several years (even while the ecosystem has evolved considerably).


> its specification has been stable for several years

And by "several", you mean "at least fifteen at the time"? ;)


Grammarly uses it for many of their services.: https://www.grammarly.com/blog/engineering/running-lisp-in-p...

Pathway Tools is a genetic database search/comparison tool (I think, it's a bit over my head) and powers the https://biocyc.org/ website.

Both are pretty typical use cases. Back-end services dealing with heavy symbolic manipulations. I think the most common use of Lisp is to implement Lisp though. ;)


Just to add an example, there is a Portuguese company called SISCOG which produces planning software in (Common) Lisp, mainly for railway timetables, powering many European railway systems.


You are, indirectly, by posting on hacker news...


I've seen some self reported “lisp pros” working every day on lisp projects, but they are pretty rare compared to c++/javascript/python/c# programmers. It usually sounds like they are senior people who get their choice of languages and/or consultants who write and maintain very custom stuff with long-term, loyal customers.


The scripting language for Halo 1/2/3/ODST/Reach were based on Lisp. Halo 4 kept a similar syntax but got rid of the parentheses to be more BASIC-like. Halo 5 and now Infinite moved on completely to Lua.

Halo 1 reference: https://andrew.gg/scripts/00reference.html

Decompiled campaign scripts: https://github.com/Nibre/blamscript/blob/master/blamscript/h...


> There must be some reason Lisp is so popular on HN.

sicp is a really good book, but also: https://marktarver.com/bipolar.html


> Been in software dev industry for over 20 years. Never encountered a single project employing the language.

Are you saying you've never seen AutoCAD or Emacs in your life? While possible, I find it somewhat difficult to believe.


This gets posted to HN every so often: https://common-lisp.net/lisp-companies


Mind blown. I feel blind and mown.


One biggest downfall of lisp is that people just call anything with parenthesized prefix notation "lisp".


Did you not read far enough to see that it runs a program (proof assistant) written for LISP 1.5 (1961)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: