There’s a literary reference as well. Remember the delicate and understated lowercase beginning of the evergreen first program. Most other programming examples get this wrong and I miss it sorely:
hello, world.
My guess is it refers to archy and mehitabel, very popular early 20th century free verse poetry by Don Marquis. His work was still very much in favor with intellectuals around the time of B’s creation. Or it might have been influenced by the poet e.e. cummings.
Although the book was written in 1979, BCPL dates to 1967. There is a hello, world example on page 8. Wondering if it is likely that hello, world as a simple example program really began with B?
I am currently in the process of learning C and having never seen any B code before I am rather surprised at their striking similarities. What could C do at the time that B couldn’t? Why did C eventually replace B?
You just caught me at a moment where I'm reading Tanenbaum's book on Modern Operating Systems. Tanenbaum simplifies a lot of things, but here is his contribution [1]:
> The second development concerned the language in which UNIX was written. By now it was becoming painfully obvious that having to rewrite the entire system for each new machine was no fun at all [0], so Thompson decided to rewrite UNIX in a high-level language of his own design, called B. B was a simplified form of BCPL (which itself was a simplified form of CPL, which, like PL/I, never worked). Due to weaknesses in B, primarily lack of structures, this attempt was not successful. Ritchie then designed a successor to B, (naturally) called C, and wrote an excellent compiler for it. Working together, Thompson and Ritchie rewrote UNIX in C. C was the right language at the right time and has dominated system programming ever since.
Tanenbaum doesn't say it, but it almost seems like B and C were designed for creating UNIX. I wonder to what extent the authors of B and C were designing the languages for creating UNIX.
[0] In one of the previous paragraphs, Tanenbaum mentioned that the first version of UNIX was written in assembly.
As always on HN there is a certain amount of ... discussion about some of the finer points, generally about typing. I suspect that if you follow the refs in the WP article most of the usual bikeshedding here will resolve itself satisfactorily.
For me (50 year old bloke) I dimly recall C always being available https://en.wikipedia.org/wiki/C_(programming_language) - apparently 1972ish so I was 2 or 3 when C came out and replaced B which probably didn't do much ....
Oh look at this (wrt B): "However, it continues to see use on GCOS mainframes (as of 2014)"
I used that B compiler on a Honeywell GCOS mainframe in the late 1970s and early 1980s. It was a blindingly fast compiler written at University of Waterloo by a Brazilian named Reinaldo Braga. Still available from a Canadian company named Thinkage. The Honeywell 6000 series used 36 bit words.
>based on Ken Thompson's earlier B interpreter which had in turn been modeled on BCPL,
Iirc, and it’s been a while since I’ve read the history of Unix. B started as an interpreted language but evolved into C as types were added and it was switched to compiled to better interact with UNIX operating system internals (with Ritchie wanting to rewrite large portions of Unix in the new language.)
Here is a snippet from an interview of B's creator, Ken Thompson in 1989, explaining this:
"MSM: Did you develop B?
Thompson: I did B.
MSM: As a subset of BCPL
Thompson: It wasn't a subset. It was almost exactly the same. It was a interpreter instead of a compiler. It had two passes. One went into intermediate language and which one was the interpreter of the intermediate language. Dennis wrote a compiler for B, that worked out of the intermediate language."
The B version for the Honeywell 6070 was compiled. B on the PDP-7 and -11 was interpreted. In fact I have a writeup on the threaded code for these two: http://squoze.net/B/
> All arithmetic in B is integer, unless special functions are written. There is no equivalent of the Fortran IJKLMN convention, no floating point, no data types, no type conversions, and no type checking. Users of double-precision complex will have to fend for themselves.
main( ) {
extrn a, b, c;
putchar(a); putchar(b); putchar(c); putchar('!*n');
}
a 'hell';
b 'o, w';
c 'orld';
B uses single quotes to denote a character (like C), and double quotes to denote a string. Each character is a word (as are all variables in B), which is 36 bits long, so it can hold 4 ASCII characters! A character literal with fewer than 4 characters is zero padded (as is, presumably, the left over 4-bit nibble). In fact an earlier snippet in that document just outputs "Hi!" because that way you only need one character:
In C, the character constant is still of type int, not char like in C++.
Multi-character constants are possible, and have an implementation-defined value.
int fourcc = 'abcd';
(This may be supported by C++ also, I'm not sure. So that is to say, a multi-character constant in C++ perhaps doesn't have type char, but an implementation-defined type.)
GCC on Ubuntu 18.04:
$ cat hello.c
#include <stdio.h>
int main(void)
{
int hello[] = { 'lleH', 'w ,o', 'dlro', '!', 0 };
puts((char *) hello);
return 0;
}
$ gcc hello.c -o hello
hello.c: In function ‘main’:
hello.c:5:19: warning: multi-character character constant [-Wmultichar]
int hello[] = { 'lleH', 'w ,o', 'dlro', '!', 0 };
^~~~~~
[ ... and similar errors ...]
$ ./hello
Hello, world!
It works if the program is compiled with g++ also, in spite of a single character constant like '!' being char.
We have to write the characters backwards because of little endian. In the source code, the leftmost character is the most significant byte, and on the little-endian system, that means it goes to the higher address. The first character H must be the rightmost part of the character constant so that it ends up in the least significant byte which goes to the lowest address in memory.
Endianness wouldn't be an issue in B because it doesn't exist; there is no "char *" pointer accessing the data as individual characters. B could be implemented on big or little endian and the string handling would work right.
Yep, before ASCII was standardized it was common for machines to be built with word-addressable memory and words that were multiples of six bits. Two octal digits easily represent a six-bit byte, just as two hexadecimal digits easily represent an eight-bit byte
but they didn't represent lowercase (!). (2⁶ would have allowed you to represent lowercase but you would have to sacrifice a whole lot to do so -- as alphanumerics alone would use 62 positions, leaving you with maybe one position for a space and one punctuation mark, and no newline...)
Apparently EBCDIC derives from IBM's 6-bit BCD codes
and is interesting because it uses 8 bits, successfully represents lowercase, and leaves a ton of (non-contiguous) positions unspecified.
Maybe our standards (no pun intended) are just shifting as we deal with more and more capable software, but I'd be inclined to say that seven bits "easily encode" standard English text, and six don't, on account of the lack of case distinction. (Although you could certainly choose to handle that with control characters, and I'm sure some 6-bit systems did so.)
Yeah once you go to six bits (five is weird because it’s an odd number) you really start looking at seven - and seven is also odd (which is why seven + parity took off for awhile). But now you have eight and that should be good enough for anyone.
Interestingly enough B talks about how the new computer can address a “char” and not just a whole word at a time.
But even then, in this tutorial, you can see that it was already considered something to try and avoid:
(Good programming practice dictates using few labels, so later examples will strive to get rid of them.)
and then later...
The ability to replace single statements by complex ones at will is one thing that makes B much more pleasant to use than Fortran. Logic which requires several GOTO's and labels in Fortran can be done in a simple clear natural way using the compound statements of B.
It‘s interesting that B doesn’t require forward declarations, so global elements can be ordered with higher-level concerns first (e.g. main) referencing lower-level elements further down in the source file. Later C and C++ required a reversed ordering (or explicit forward declarations).
The keyword was always there, but it meant something different. In modern C++, it tells the compiler to perform type inference. In B, it told the compiler to dynamically allocate a variable on the stack, as opposed to assigning it a static address, or a register.
Today, auto in C is still a storage class that denotes stack allocation of a variable. This is related to the other storage classes of register, static, and extern.
In C each variable and function has two attributes, type and storage class
There does seem to be some type inference going on though "auto a; a= 'hi!';", or maybe it's just defaulting to an int/pointer type? If it is type inference it would have been awful to have the variable declaration at the top but the type declaration strewn through the function.
So it looks as though C switched from default static to default auto? I wonder if the programmers of the time sneered at the waste this added?
Interesting that it started out as typeless but as C came from B, they introduced types only for Python to come along and make typeless popular again. Types definitely do have their usefulness but you can certainly get by without them.
Like someone else mentioned below, B isn't really typeless, so much as it's all just one type, which was the standard for BCPL -- everything is just a word! If memory serves, part of what made C itself was adding "char", which refers to a single byte, using the (newly released!) PDP-11's byte addressing mode (which didn't exist on the machines B was started on).
Nowadays we do byte addressing for everything, but in PDP-11 machine code a byte address and word address are different AFAIK, so char and int pointers would be incompatible.
While C is a direct descendant of B, Python is a very different beast. It's a bit odd to frame Python as "C but without types". Maybe I misunderstand what you're saying.
Also I don't know much about B but given its age I seriously doubt that it's dynamically typed like Python. I suspect that it's more like assembly: you don't have types because most of everything is effectively an integer if you squint hard enough, and the way you decide to use the data lets the compiler/CPU know how to treat it.
For instance in this snippet from TFA:
v[10] 'hi!', 1, 2, 3, 0777;
You may think that `v` is clearly dynamically typed, since it contains both ints and a character string, but I actually think it's a lot simpler than that: a pointer to a string is effectively an int, so you can store a pointer in an int array no problem. You can still do it (mostly non-portably) in modern C, you'll just need a cast or two at most.
Of course it means that the type info is not actually carried by the variable like in Python. If you write `'hi!' + 'oy'` in Python you get 'hi!oy'. If you write 'hi!' + 'oy' in B I suspect that you get a garbage pointer, if not straight up undefined behaviour.
B only had one type, "machine word", and a word could be interpreted as a pointer or an integer. The notion in C that an array is effectively a pointer to its first element is also in B.
Python is not typeless as in the sense of B. In B, everything is int, while in Python you still have floats and other stuff. Sure, Python does need declarations but that's called duck typing.