Hacker News new | past | comments | ask | show | jobs | submit login
The original “Hello World”: B. W. Kernighan's intro to B (1973) (archive.org)
113 points by nanna on May 6, 2021 | hide | past | favorite | 55 comments



There’s a literary reference as well. Remember the delicate and understated lowercase beginning of the evergreen first program. Most other programming examples get this wrong and I miss it sorely:

    hello, world.
My guess is it refers to archy and mehitabel, very popular early 20th century free verse poetry by Don Marquis. His work was still very much in favor with intellectuals around the time of B’s creation. Or it might have been influenced by the poet e.e. cummings.


<pulls BCPL book from shelf>

Although the book was written in 1979, BCPL dates to 1967. There is a hello, world example on page 8. Wondering if it is likely that hello, world as a simple example program really began with B?


There is some discussion here: https://en.wikipedia.org/wiki/B_(programming_language)

BCPL -> B -> C (perhaps) but there is a lot of overlap. C came out around 1973.

As you say: Hello World is quite old and probably predates these Johnny-come-lately, fly by nights like C.


I am currently in the process of learning C and having never seen any B code before I am rather surprised at their striking similarities. What could C do at the time that B couldn’t? Why did C eventually replace B?


You just caught me at a moment where I'm reading Tanenbaum's book on Modern Operating Systems. Tanenbaum simplifies a lot of things, but here is his contribution [1]:

> The second development concerned the language in which UNIX was written. By now it was becoming painfully obvious that having to rewrite the entire system for each new machine was no fun at all [0], so Thompson decided to rewrite UNIX in a high-level language of his own design, called B. B was a simplified form of BCPL (which itself was a simplified form of CPL, which, like PL/I, never worked). Due to weaknesses in B, primarily lack of structures, this attempt was not successful. Ritchie then designed a successor to B, (naturally) called C, and wrote an excellent compiler for it. Working together, Thompson and Ritchie rewrote UNIX in C. C was the right language at the right time and has dominated system programming ever since.

Tanenbaum doesn't say it, but it almost seems like B and C were designed for creating UNIX. I wonder to what extent the authors of B and C were designing the languages for creating UNIX.

[0] In one of the previous paragraphs, Tanenbaum mentioned that the first version of UNIX was written in assembly.

[1] Modern Operating Systems (ed. 4, p. 715)


You could start here: https://en.wikipedia.org/wiki/B_(programming_language)

As always on HN there is a certain amount of ... discussion about some of the finer points, generally about typing. I suspect that if you follow the refs in the WP article most of the usual bikeshedding here will resolve itself satisfactorily.

For me (50 year old bloke) I dimly recall C always being available https://en.wikipedia.org/wiki/C_(programming_language) - apparently 1972ish so I was 2 or 3 when C came out and replaced B which probably didn't do much ....

Oh look at this (wrt B): "However, it continues to see use on GCOS mainframes (as of 2014)"


I used that B compiler on a Honeywell GCOS mainframe in the late 1970s and early 1980s. It was a blindingly fast compiler written at University of Waterloo by a Brazilian named Reinaldo Braga. Still available from a Canadian company named Thinkage. The Honeywell 6000 series used 36 bit words.

C was needed because the byte-addressed PDP-11 was a bad fit for the word oriented B language. See https://archive.org/details/bstj57-6-1991 for details.


Thinkage is still around, here is the B compiler manual from their website: https://www.thinkage.ca/gcos/expl/b/manu/manu.html

Although it appears their main business nowadays is a Windows-based application to manage maintenance requests for landlords, etc


I don't know the answer to your question, but it's worth mentioning as well that C itself was pretty different before C89.


B was interpreted while C wasn’t. C is basically compiled B which is why the letter was incremented.


The linked article says that B was compiled...


Well everything I’ve ever read says otherwise and it’s often referred to as an interpreter.

http://progopedia.com/language/b/

>B is an interpreted programming language for mini-computers, a direct descendant of BCPL and ancestor of C.

http://www.catb.org/~esr/writings/taoup/html/c_evolution.htm...

>based on Ken Thompson's earlier B interpreter which had in turn been modeled on BCPL,

Iirc, and it’s been a while since I’ve read the history of Unix. B started as an interpreted language but evolved into C as types were added and it was switched to compiled to better interact with UNIX operating system internals (with Ritchie wanting to rewrite large portions of Unix in the new language.)


That's not exactly correct.

Here is a snippet from an interview of B's creator, Ken Thompson in 1989, explaining this:

"MSM: Did you develop B?

Thompson: I did B.

MSM: As a subset of BCPL

Thompson: It wasn't a subset. It was almost exactly the same. It was a interpreter instead of a compiler. It had two passes. One went into intermediate language and which one was the interpreter of the intermediate language. Dennis wrote a compiler for B, that worked out of the intermediate language."

~ https://www.princeton.edu/~hos/mike/transcripts/thompson.htm


Ah okay I figured I was remembering incorrectly but was pretty sure it was, at least initially, interpreted.


The B version for the Honeywell 6070 was compiled. B on the PDP-7 and -11 was interpreted. In fact I have a writeup on the threaded code for these two: http://squoze.net/B/


Types and especially structured types. https://www.bell-labs.com/usr/dmr/www/chist.html


> All arithmetic in B is integer, unless special functions are written. There is no equivalent of the Fortran IJKLMN convention, no floating point, no data types, no type conversions, and no type checking. Users of double-precision complex will have to fend for themselves.

This seems like a biggie.


char. float. a.out.


> auto s[20] ;

> putstr(getstr(s)); putchar('*n');

What could possibly go wrong?


Here it is, pulled out of the linked page:

    main( ) {
     extrn a, b, c;
     putchar(a); putchar(b); putchar(c); putchar('!*n');
    }
    
    a 'hell';
    b 'o, w';
    c 'orld';
B uses single quotes to denote a character (like C), and double quotes to denote a string. Each character is a word (as are all variables in B), which is 36 bits long, so it can hold 4 ASCII characters! A character literal with fewer than 4 characters is zero padded (as is, presumably, the left over 4-bit nibble). In fact an earlier snippet in that document just outputs "Hi!" because that way you only need one character:

    main( ) {
      auto a;
      a= 'hi!';
      putchar(a);
      putchar('*n' );
    }


In C, the character constant is still of type int, not char like in C++.

Multi-character constants are possible, and have an implementation-defined value.

   int fourcc = 'abcd';
(This may be supported by C++ also, I'm not sure. So that is to say, a multi-character constant in C++ perhaps doesn't have type char, but an implementation-defined type.)

GCC on Ubuntu 18.04:

  $ cat hello.c
  #include <stdio.h>
  
  int main(void)
  {
    int hello[] = { 'lleH', 'w ,o', 'dlro', '!', 0 };
    puts((char *) hello);
    return 0;
  }

  $ gcc hello.c -o hello
  hello.c: In function ‘main’:
  hello.c:5:19: warning: multi-character character constant [-Wmultichar]
     int hello[] = { 'lleH', 'w ,o', 'dlro', '!', 0 };
                   ^~~~~~

  [ ... and similar errors ...]

                        
  $ ./hello 
  Hello, world!
It works if the program is compiled with g++ also, in spite of a single character constant like '!' being char.

We have to write the characters backwards because of little endian. In the source code, the leftmost character is the most significant byte, and on the little-endian system, that means it goes to the higher address. The first character H must be the rightmost part of the character constant so that it ends up in the least significant byte which goes to the lowest address in memory.

Endianness wouldn't be an issue in B because it doesn't exist; there is no "char *" pointer accessing the data as individual characters. B could be implemented on big or little endian and the string handling would work right.


   a "hello"; b "world'*;
   v[2] "now is the time", "for all good men",
       "to come to the aid of the party";


“Since B is often used for system programming and bit-manipulation, octal numbers are an important part of the language.”

I’m curious why octal fell out of style. Hexadecimal seems more useful in every way. Perhaps it relates to using 36-bit words?

Literally the only octal I use is for chmod.


> Perhaps it relates to using 36-bit words?

Yep, before ASCII was standardized it was common for machines to be built with word-addressable memory and words that were multiples of six bits. Two octal digits easily represent a six-bit byte, just as two hexadecimal digits easily represent an eight-bit byte


> words that were multiples of six bits

Why was six bits chosen? The modern use of eight bits seems more natural to me, being a power of 2.


Six bits is the first that can easily encode standard text.


Though "easily" is relative as Baudot code used only five bits:

https://en.wikipedia.org/wiki/Baudot_code

(with control characters to shift to other character sets!).

Wikipedia says that the several six-bit character set standards for text were inspired by typewriters

https://en.wikipedia.org/wiki/BCD_(character_encoding)#Examp...

but they didn't represent lowercase (!). (2⁶ would have allowed you to represent lowercase but you would have to sacrifice a whole lot to do so -- as alphanumerics alone would use 62 positions, leaving you with maybe one position for a space and one punctuation mark, and no newline...)

Apparently EBCDIC derives from IBM's 6-bit BCD codes

https://en.wikipedia.org/wiki/EBCDIC

and is interesting because it uses 8 bits, successfully represents lowercase, and leaves a ton of (non-contiguous) positions unspecified.

Maybe our standards (no pun intended) are just shifting as we deal with more and more capable software, but I'd be inclined to say that seven bits "easily encode" standard English text, and six don't, on account of the lack of case distinction. (Although you could certainly choose to handle that with control characters, and I'm sure some 6-bit systems did so.)


Yeah once you go to six bits (five is weird because it’s an odd number) you really start looking at seven - and seven is also odd (which is why seven + parity took off for awhile). But now you have eight and that should be good enough for anyone.

Interestingly enough B talks about how the new computer can address a “char” and not just a whole word at a time.


Interestingly, "hello world" _isn't_ the first program or program fragment that gets introduced!


Seeing Hollerith[1] gives me Fortran flashbacks. Doing string manipulation in integers is a pain.

[1] https://en.wikipedia.org/wiki/Hollerith_constant


  main( ) {
   extrn a, b, c;
   putchar(a); putchar(b); putchar(c); putchar('!\*n');
  }

  a 'hell';
  b 'o, w';
  c 'orld';


You seem to have a stray backslash. Is your C leaking through?


I think he’s trying to escape the asterisk but doesn’t realize that’s not required as long as there isn’t another one following it.


I'm not a C programmer. I copy/pasted. It appears that it got munged.


Neat, it has an example of the kind of thing Dijkstra was talking about when he wrote Go To Statement Considered Harmful.

    main( ) {
      auto c;
    read:
      c= getchar();
      putchar(c);
      if(c != '*n') goto read;
    }               /* loop if not a newline */
People forget the world of ubiquitous goto enabled control flow with languages like this and Dartmouth BASIC.


But even then, in this tutorial, you can see that it was already considered something to try and avoid:

(Good programming practice dictates using few labels, so later examples will strive to get rid of them.)

and then later...

The ability to replace single statements by complex ones at will is one thing that makes B much more pleasant to use than Fortran. Logic which requires several GOTO's and labels in Fortran can be done in a simple clear natural way using the compound statements of B.


It‘s interesting that B doesn’t require forward declarations, so global elements can be ordered with higher-level concerns first (e.g. main) referencing lower-level elements further down in the source file. Later C and C++ required a reversed ordering (or explicit forward declarations).


The masters are still schooling me:

  c = c+'A'-'a';
Though they too learn... C fixed this problematic syntax:

  x =- 10
So they are not that different from me - just smarter.


Interesting that it has `auto` decades before C/C++


The keyword was always there, but it meant something different. In modern C++, it tells the compiler to perform type inference. In B, it told the compiler to dynamically allocate a variable on the stack, as opposed to assigning it a static address, or a register.


Today, auto in C is still a storage class that denotes stack allocation of a variable. This is related to the other storage classes of register, static, and extern.

In C each variable and function has two attributes, type and storage class


C still has it around with the same meaning.


There does seem to be some type inference going on though "auto a; a= 'hi!';", or maybe it's just defaulting to an int/pointer type? If it is type inference it would have been awful to have the variable declaration at the top but the type declaration strewn through the function.

So it looks as though C switched from default static to default auto? I wonder if the programmers of the time sneered at the waste this added?


There are no types in B, only words. Auto denotes storage.


Yes, for example, in old C the type was implicitly int unless mentioned. "Long" means long int" ;)


This should be an NFT.


It has all the properties of an NFT already. It’s just a URL.


I'll buy it for 69 million schrute bucks.


Interesting that it started out as typeless but as C came from B, they introduced types only for Python to come along and make typeless popular again. Types definitely do have their usefulness but you can certainly get by without them.


Like someone else mentioned below, B isn't really typeless, so much as it's all just one type, which was the standard for BCPL -- everything is just a word! If memory serves, part of what made C itself was adding "char", which refers to a single byte, using the (newly released!) PDP-11's byte addressing mode (which didn't exist on the machines B was started on).

Nowadays we do byte addressing for everything, but in PDP-11 machine code a byte address and word address are different AFAIK, so char and int pointers would be incompatible.


If you look at the current trends, it seems that people are realizing the problems of typeless. Look at the effort to retrofit types onto Python.

I think strong typing with type inference hits the sweet spot of providing the safety of types, while also cutting down on boilerplate.


While C is a direct descendant of B, Python is a very different beast. It's a bit odd to frame Python as "C but without types". Maybe I misunderstand what you're saying.

Also I don't know much about B but given its age I seriously doubt that it's dynamically typed like Python. I suspect that it's more like assembly: you don't have types because most of everything is effectively an integer if you squint hard enough, and the way you decide to use the data lets the compiler/CPU know how to treat it.

For instance in this snippet from TFA:

  v[10]  'hi!', 1, 2, 3, 0777;
You may think that `v` is clearly dynamically typed, since it contains both ints and a character string, but I actually think it's a lot simpler than that: a pointer to a string is effectively an int, so you can store a pointer in an int array no problem. You can still do it (mostly non-portably) in modern C, you'll just need a cast or two at most.

Of course it means that the type info is not actually carried by the variable like in Python. If you write `'hi!' + 'oy'` in Python you get 'hi!oy'. If you write 'hi!' + 'oy' in B I suspect that you get a garbage pointer, if not straight up undefined behaviour.


B only had one type, "machine word", and a word could be interpreted as a pointer or an integer. The notion in C that an array is effectively a pointer to its first element is also in B.


Python is not typeless as in the sense of B. In B, everything is int, while in Python you still have floats and other stuff. Sure, Python does need declarations but that's called duck typing.


Python is a Strongly Typed, dynamically typed language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: