Continuous Unix commit history from 1970 until today

vandahm · on June 16, 2022

You don't see this every day:

https://github.com/dspinellis/unix-history-repo/blob/Researc...

Is this B, or is it BCPL? What would have compiled this code back in the day?

projektfu · on June 16, 2022

It's B. BCPL has "LET MAIN() BE $(..." instead of "main $(...".

Running B was a challenge on the PDP-7 but easier on the PDP-11, apparently, because of the increase of memory size. The linked document has an interesting history about compiling B to threaded code, a form of interpreted code, and then to machine language. B never really made the jump to a full-fledged citizen because it quickly got replaced by C, although BCPL was popular for a long time.

https://www.bell-labs.com/usr/dmr/www/chist.html

pm215 · on June 16, 2022

Wikipedia's article on B says that BCPL used := for assignment and = for equality tests, whereas B used = for assignment and == for equality. Assuming that's correct, this must be B code.

swatcoder · on June 16, 2022

I don’t know, but I love how clearly and concisely it expresses what would later become ubiquitous as do-while and continue.

That’s poetry. Nice find.

stingraycharles · on June 16, 2022

I love how thin the layer above assembly is: without knowing B, is my interpretation correct that this function effectively “inherits” the stack of the calling function? In other words, rather than passing function arguments and let the compiler deal with it, you’re supposed to push the string you want to lcase onto the top of the stack?

Reminds me a lot of writing my own compiler/assembler in university, where it’s expected that all this happens automatically nowadays.

messe · on June 16, 2022

No, that's not correct. It reads the string from standard input. A C translation would look like this:

    main()
    {
        int ch;
        while ((ch = read()) != 4) {
            if (ch > 0100 && ch < 0133)
                ch = ch + 040;
            if (ch == 015) continue;
            if (ch == 014) continue;
            if (ch == 011) {
               ch = 040040;
               write(040040);
               write(040040);
            }
            write(ch);
        }
    }

A more modern C version would look like:

    #include <stdio.h>

    int
    main(void)
    {
        int ch;
        while ((ch = getchar()) != -1) {
            if (ch > 0100 && ch < 0133)
                ch = ch + 040;
            if (ch == 015) continue;
            if (ch == 014) continue;
            // No need to handle tabstop specially
            putchar(ch);
        }
    }

anyfoo · on June 16, 2022

Hmm, don’t think so. The function does not operate on a string, it seems to read a character using read() and write it back, transformed, using write(). Given that the function is named main, it’s probably the top level function anyway (from the programmer’s point of view, often the OS actually calls into a different function that is part of the language runtime, e.g. _start, which in turn calls main eventually, but that is usually hidden from the programmer).

jibal · on June 17, 2022

This is the main function ... there is no calling function. Nor is a string on the stack being accessed.

judge2020 · on June 16, 2022

Is that truly from 1970? For example, that commit's grandparent seems to have been specifically crafted to use "Date: Thu, 1 Jan 1970 00:00:00 +0000" https://github.com/dspinellis/unix-history-repo/commit/185f8....

anyfoo · on June 16, 2022

That’s 0 in Unix epoch time (guess why!), so seems more like a missing timestamp than a crafted one. The fact that the linked file does not have a 0 timestamp, but a slightly later one, suggests it's valid, or at least intended to be valid.

Nition · on June 16, 2022

I recall that in A Deepness in the Sky by Vernor Vinge, a space sci-fi set in the far future, they're still using Unix time underneath many many layers of abstractions, and with their cultural context they guess that humanity must have set it to start with the moment mankind first travelled into space to land on the Moon.

anyfoo · on June 16, 2022

Hah, plausible. Not far off timewise, and yet totally wrong, but understandable how such a conclusion could be made.

marcodiego · on June 16, 2022

They had "auto" vars in 1970. WG14, the ISO work group that maintains the C programming language specification, has just recently discussed acceptance of __auto_type.

EDIT: ops, the "auto" here means automatic allocation.

mftb · on June 16, 2022

Yea, I have to say, to me, this is cool. Glad to see this sort of history being preserved.

Erlangen · on June 16, 2022

So auto is used as a keyword here. Maybe C inherits this never-used auto from B?

veltas · on June 16, 2022

auto stands for 'automatic', because such variables are automatically allocated for each function invocation. In C it became redundant because base types were added, and so the base type could start the definition (auto was still permitted with default base type of int until C99 I think). auto in B is a bit like 'let', it starts a declaration, along with 'extrn'.

hoten · on June 16, 2022

very weird that two characters - $( and $) - were used before { and }

did old keyboards not have curly braces or what?

mseepgood · on June 17, 2022

In C you can still use the digraphs <% and %> as an alternative to curly braces:

    int main() <%                                                                   
        printf("hello, world\n");                                                   
        return 0;                                                                   
    %>

kps · on June 16, 2022

{} were added to the 1967 revision of ASCII, along with `|~ and lower case. (EBCDIC never got them in the base character set, only in alternate ‘code pages’.)

usr1106 · on June 17, 2022

I remember in 1990 IBM sponsored a small 370 for our university. I fiddled weeks to get curly braces to work correctly. We were all used to work with Sun workstations or at most VAXen at the time. It was unbelievable how complicated this was in the IBM world. They were still living in the world of full-time machine operators. My colleagues were glad I did it, my professor who had not programmed for years and was moving in higher spheres was not impressed I had spent so much time on it when he learned about it later.

mprovost · on June 16, 2022

This repo has been super useful as I've been writing a book that teaches Rust by rewriting classic Unix utilities. I settled on using the 4.4 BSD source as a base but having the whole history available has been really interesting. Recently I came across a bug in the 4.4 version of cat that wasn't fixed until a few years later (in FreeBSD).

justsomeguy123 · on June 16, 2022

Gource Visualization video which points to https://www.youtube.com/watch?v=S7JB0mhrGCQ does not work anymore.

> Video unavailable > This video is no longer available because the YouTube account associated with this video has been terminated.

danuker · on June 16, 2022

We need to solve this problem.

YouTube is free to delete any account, even just to cut costs.

cmeacham98 · on June 16, 2022

I'm not sure what the problem to be solved here is. It doesn't seem reasonable to force YouTube (or any other free video host) to indefinitely store and host content.

If you want something to stay around on the internet it has to take up space on somebody's drive and bandwidth on somebody's network connection - and for sufficiently large content like video you're going to have to do that yourself or convince/pay someone you trust to do so on your behalf.

danuker · on June 17, 2022

I am thinking of everyone hosting their own videos, and being able to comment on each other's. Is there a federated YouTube?

Something like Mastodon/Pleroma.

parasti · on June 17, 2022

Peertube

njharman · on June 17, 2022

Are you sure it was YT and not the creater who deleted acct.

Also there is a solution already, it's called "The Internet". Upload your content far and wide.

justsomeguy123 · on June 17, 2022

> the YouTube account associated with this video has been terminated.

"Terminated" is a pretty harsh wording for an account that was willingly deleted.

When you resign nobody says you got terminated. You are terminated when you are fired.

wolverine876 · on June 16, 2022

I assume Github, the host of the OP, can do the same. How many people have entrusted their life's work to it?

JonChesterfield · on June 17, 2022

I sincerely hope none given how easy git is to mirror and the risk of Microsoft killing accounts

atlacatl_sv · on June 17, 2022

Found a video showing the history of Python: https://youtu.be/cNBtDstOTmA

ChrisArchitect · on June 16, 2022

You don't see this every day.....

But you do see it every year for the last number of years

Some previous discussion from 3 years ago:

https://news.ycombinator.com/item?id=19429249

ninefathom · on June 16, 2022

Anybody feel brave enough to try merging in SVR4?

https://github.com/dspinellis/unix-history-repo/blob/Researc...

https://github.com/illumos/illumos-gate/blob/9ecd05bdc59e4a1...

dgrin91 · on June 16, 2022

I like how Github shows it as infinity commits

deathanatos · on June 16, 2022

What's up with that? There only seem to be 4, on HEAD?

caslon · on June 16, 2022

Check the other branches.

kevincox · on June 16, 2022

I always expected that the commit count was for that branch. I guess it is global?

deathanatos · on June 16, 2022

I saw the other branches when I made the comment.

The commit count is — usually — the commit count from the currently selected ref.

E.g., on a sample repo, "master" displays as 29,474 commits. "master^" displays as 29,473.

ollien · on June 16, 2022

Yeah, is that a bug? lol

mywittyname · on June 16, 2022

Sounds like a overflow bug prevention mechanism.

There are an infinite number of infinities, so surely one of them is the maximum possible commits in github.

kps · on June 16, 2022

Git runs into problems with more than 2¹⁶⁰ commits in a repository.

projektfu · on June 16, 2022

I love Spinellis' work on teaching reading of code.

PAPPPmAc · on June 16, 2022

Diomidis Spinellis' "Code Reading: The Open Source Perspective" is a thing I've wanted but didn't know existed, browsing it now to hopefully recommend, thanks for the pointer.

I work with computer engineering students and often tell them that reading more code would be good for them but have never had a great generic but concrete suggestion for how to get there.

The second best programming class I took in college was a graduate elective and the _only_ code-reading-based course I took or knew of being offered: a guided safari in the Linux kernel sources where we had to make targeted changes for the assignments. FTR, the best programming class was set up as "new language in a different paradigm every few weeks, write one small program that suits it and one small program that doesn't," not incidentally taught by the same person ( https://en.wikipedia.org/wiki/Raphael_Finkel ).

danschuller · on June 16, 2022

We have all this commit data at scale, it really feels like there are interesting stories or lessons that could be extracted from them.

There's kind of the obvious operational stuff like: What are the properties of commits that introduce bugs compared to those that don't. Which type of commits are rarely changed and which are more likely to be changed over time. But what I'd find even more interesting is some insight into how we solve problems and how well we're able to solve them. I guess part of the puzzle is missing - the external requirements / environment that give rise to some number of the commits.

DSpinellis · on June 16, 2022

There is a series of conferences MSR — Mining Software Repositories — with research papers looking at such questions. http://www.msrconf.org/ In fact, I presented this work in the 2015 MSR conference.

ChrisMarshallNY · on June 16, 2022

That's a lot of work!

A true labor of love.

Thanks!

roansh · on June 16, 2022

How would you feel if your commits become publicly available for everyone to see forever?

pavon · on June 16, 2022

That ship sailed nearly half a century ago. All of this source code was previously licensed to research universities starting in 1975. The earlier releases weren't under FLOSS license like we know them today, but with the intent that researchers would be reading, learning from, and modifying the code. And they did! creating later BSD Unix releases with more open licenses whose code was shared more widely under more permissive licenses.

Finally, the people who created this repo are some of the primary authors of the code. They wanted this to be in the open.

bentley · on June 16, 2022

There was an interesting discussion in 2019 after a group of people started cracking the passwords of the original Unix developers that had been obtained from an old /etc/passwd file in this repo (https://github.com/dspinellis/unix-history-repo/blob/BSD-3-S...).

Rob Pike spoke out against the effort, calling it “distasteful.” https://inbox.vuxu.org/tuhs/CAKzdPgw0Vz8UFbK7c_Jr+RHGMssSxN=...

Nonetheless, in the end every password was cracked. Some highlights:

Steve Bourne: “bourne”

Dennis Ritchie: “dmac”

Kirk McKusick: “foobar”

Brian Kernighan: “/.,/.,”

Ken Thompson: “p/q2-q4!” (a chess move)

Bill Joy: cracked but not posted due to Rob Pike’s comments, but it contained a control character

jasinjames · on June 17, 2022

Kernighan's is my favorite. The keyboard layout could be different, but im imagining him rapping his fingers against the three adjacent keys as if the motion itself were a secret handshake.

e40 · on June 16, 2022

Isn't it cool? I mean, being in the history of a project like this... it could be around long after we are gone.

ARandomerDude · on June 16, 2022

This is the point of GitHub. Also Unix was(/is) a masterwork of craftsmanship. Struggling to see a problem here.

wbl · on June 16, 2022

Eh, I think the select and poll system calls are both kludges, the sockets API inferior to P9 dial, gethostbyname deeply problematic.

Then there are the ways threads interact badly with many classic functions, the way signal handlers play messily with everything else.

Don't even ask about X.

ARandomerDude · on June 17, 2022

An important question:

Could I have thought of something at that time, with all the same constraints and without the benefit of hindsight, that would have been better?

For the vast -- and I mean vast -- majority of us, the answer to that question is a resounding No!.

wbl · on June 19, 2022

At what time? Many of the issues are the result of evolution and bolting things on vs. redesigning things as conditions changed.

midislack · on June 17, 2022

X is fantastic and amazing, not just for the time, even now. Not switching.

jrochkind1 · on June 16, 2022

Really proud to be a part of history.

duxup · on June 16, 2022

I hope everyone is ok with cursing….

userbinator · on June 17, 2022

Proud.

alar44 · on June 16, 2022

Fine. You?

smartineng · on June 17, 2022

Does any one is able to fully bootstrappe it now ?

relaxing · on June 17, 2022

So what’s the oldest line of code currently active?

What’s the longest-lived line of code in the repo?

sydthrowaway · on June 16, 2022

Who holds the canonical unix repo?

kps · on June 16, 2022

There is no canonical Unix repository.

Unix (1969) predates source version control (1972).

throw0101a · on June 16, 2022

> IBM's OS/360 IEBUPDTE software update tool dates back to 1962, arguably a precursor to version control system tools. A full system designed for source code control was started in 1972, Source Code Control System for the same system (OS/360). Source Code Control System's introduction, having been published on December 4, 1975, historically implied it was the first deliberate revision control system.[4] RCS followed just after,[5] with its networked version Concurrent Versions System. The next generation after Concurrent Versions System was dominated by Subversion,[6] followed by the rise of distributed revision control tools such as Git.[7]

* https://en.wikipedia.org/wiki/Version_control#History

sydthrowaway · on June 16, 2022

Who owns the modern unix copyright?

haunter · on June 17, 2022

The Open Group (Intel, IBM, Fujitsu, Huawei, Philips etc)

https://www.opengroup.org/about-us/who-we-are

https://www.opengroup.org/trademarks

Together with IEEE they are the ones giving the POSIX certification http://get.posixcertified.ieee.org/certification_guide.html

pdw · on June 17, 2022

No, they own the trademark.

I think the copyright of the old Unix code ended up with Novell in the 90s, so it would now be owned by Micro Focus.