The properties of the system, and their values: here be dragons
+
Lisp is the list processing language; that is what its name means. It processes data structures built of lists - which may be lists of lists, or lists of numbers, or lists of any other sort of data item provided for by the designers of the system.
But how is a list, in a computer, actually implemented?
-
They’re implemented as pairs, or, as the manual sometimes rather delightfully called them, ‘doublets’. Pairs of what? Pairs of pointers. Of the two pointers of a pair, the first points to the current entry of the list, and the second, by default, points to the remainder of the list, or, if the end of the list has been reached, to a special datum known as NIL
which among other things indicates that the end of the list has been reached. The pair itself is normally referred to as a ‘cons cell’ for reasons which are nerdy and not important just now (all right, because they are constructed using a function called cons
, which is in itself simply and abbreviation of ‘construct’).
+
They’re implemented as pairs, or, as the manual sometimes rather delightfully called them, ‘doublets’. Pairs of what? Pairs of pointers. Of the two pointers of a pair, the first points to the current entry of the list, and the second, by default, points to the remainder of the list, or, if the end of the list has been reached, to a special datum known as NIL
which among other things indicates that the end of the list has been reached. The pair itself is normally referred to as a ‘cons cell’ for reasons which are nerdy and not important just now (all right, because they are constructed using a function called cons
, which is in itself believed to be simply an abbreviation of ‘construct’).
Two functions are used to access the two pointers of the cell. In modern Lisps these functions are called first
and rest
, because a lot of people who aren’t greybeards find these names easier. But they aren’t the original names. The original names were CAR
and CDR
.
Why?
History
@@ -34,6 +34,18 @@ RLN | | oo 7 a |
O14 (read lines O and 1)
Of course, this isn’t proof. If CAR
and CDR
used here are standard IBM 704 assembler mnemonics – as I believe they are – then what is CONS
? It’s used in a syntactically identical way. If it also is an assembler mnemonic, then it’s hard to believe that, as legend relates, it is short for ‘construct’; on the other hand, if it’s a label representing an entry point into a subroutine, then why should CAR
and CDR
not also be labels?
+
+
Edited 3rd April to add: I’ve found a document, not related to Lisp (although John McCarthy is credited as one of the authors), which does confirm – or strictly, amend – the story. This is the CODING for the MIT-IBM 704 COMPUTER, dating from October 1957. The registers of the 704 were divided into four parts, named respectively the prefix part, the address part, the tag part, and the decrement part, of 3, 15, 3, and 15 bits respectively. The decrement part was not used in addressing; that part of the folklore I was taught isn’t right. But the names are correct. Consider this excerpt :
+
+ The address, tag and decrement parts of symbolic instructions are given in that order. In some cases the decrement, tag or address parts are not necessary; therefore the following combinations where OP represents the instruction abbreviation are permissible.
+
+
This doesn’t prove there were individual machine instructions with the mnemonics CAR
and CDR
; in fact, I’m going to say with some confidence that there were not, by reference to the table of instructions appended to the same document. The instructions do have three letter mnemonics, and they do use ‘A’ and ‘D’ as abbreviations for ‘address’ and ‘decrement’ respectively, but CAR
and CDR
are not included.
+
So it seems probable that CAR
and CDR
were labels for subroutines, as I hypothesised above. But they were quite likely pre-existing subroutines, in use before the instantiation of the Lisp project, because they would be generally useful; and the suggestion that they are contractions of ‘contents of the address part’ and ‘contents of the decrement part’, respectively, seem confirmed.
+
And, going further down the rabbit hole, there’s this. In 1957, before work on the Lisp project started, McCarthy was writing functions to add list processing to the then-new FORTRAN language, on the very same IBM 704 machine.
+
+ in this time any function that delivered integer values had to have a first letter X. Any function (as opposited to subroutines) had to have a last letter F in its name. Therefore the functions selecting parts of the IBM704 memory register (word) were introduced to be XCSRF, XCPRF, XCDRF, XCTRF and XCARF
+
+
I think that the answer has to be that if CAR
and CDR
had been named by the early Lisp team – John McCarthy and his immediate colleagues – they would not have been named as they were. If not FRST
and REST
, as in more modern Lisps, then something like P1
and P2
. CAR
and CDR
are distinctive and memorable (and therefore in my opinion worth preserving) because they very specifically name the parts of a cons cell and of nothing else.
Let’s be clear, here: when CAR
and CDR
are used in Lisp, they are returning pointers, certainly – but not in the sense that one points to a page and the other to a word. Each is an offset into a cell array, which is almost certainly an array of single 36 bit words held on a single page. So both are in effect being used as decrements. Their use in Lisp is an overload onto their original semantic meaning; they are no longer being used for the purpose for which they are named.
As far as I can tell, these names first appear in print in 1960, both in the Lisp 1 Programmer’s Manual referenced above, and in McCarthy’s paper Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. The paper was published in April so was presumably written in 1959
@@ -91,7 +103,14 @@ O14 (read lines O and 1)
There’s a view in modern software theory – with which I strongly hold – that data should be immutable. Data that changes under you is the source of all sorts of bugs. And in modern multi threaded systems, the act of reading a datum whilst some other process is writing it, or worse, two processes attempting simultaneously to write the same datum, is a source of data corruption and even crashes. So I’m very wary of mutable data; and, in modern systems where we normally have a great deal of space and a lot of processor power, making fresh copies of data structures containing the change we wanted to make is a reasonable price to pay for avoiding a whole class of bugs.
But early software was not like that. It was always constrained by the limits of the hardware on which it ran, to a degree that we are not. And the experience that we now have of the problems caused by mutable data, they did not have. So it’s core to the design of Lisp 1.5 that its lists are mutable; and, indeed, one of the biggest challenges in writing Beowulf has been implementing mutable lists in Clojure, a language carefully designed to prevent them.
But, just because Lisp 1.5 lists can be mutable, should they be? And when should they be?
-
The problem here is that spine of the system I talked about earlier.
+
The problem here is that spine of the system I talked about earlier. If we build the execution stack on top of the oblist – as at present I do – then if we make a new version of the oblist with changes in it, the new changes will not be in the copy of the oblist that the stack is built on top of; and consequently, they’ll be invisible to the running program.
+
What I do at present, and what I think may be good enough, is that each time execution returns to the read-eval-print loop, the REPL, the user’s command line, I rebuild a new execution stack on the top of the oblist as it exists now. So, if the last operation modified the oblist, the next operation will see the new, modified version. But if someone tried to run some persistent program which was writing stuff to property values and hoping to read them back in the same computation, that wouldn’t work, and it would be a very hard bug to trace down.
+
So my options are:
+
+ - To implement
PUT
and GET
in Clojure, so that they can operate on the current copy of the object list, not the one at the base of the stack. I’m slightly unwilling to do that, because my objective is to make Beowulf ultimately as self-hosting as possible.
+ - To implement
PUT
and GET
in Lisp, and have them destructively modify the working copy of the object list.
+
+
Neither of these particularly appeal.
How property lists should work
I’m still not fully understanding how property lists in Lisp 1.5 are supposed to work.
List format
@@ -180,5 +199,64 @@ O14 (read lines O and 1)
(They wrote amazingly clean code, those old masters. I could tell you a story about Chris Burton, the train, and the printer driver, that software people of today simply would not believe. But it’s true. And I think that what taught them that discipline was the high cost of even small errors.)
Lisp 1.5 doesn’t have PUT
, PUTPROP
or DEFUN
because setting properties individually, defining functions individually one at a time, was not something they ever thought about doing. And in learning that, I’ve learned more than I ever expected to about the real nature of Lisp 1.5, and the (great) people who wrote it.
+
Deeper delving
+
After writing, and publishing, this essay, I went on procrastinating, which is what I do when I’m sure I’m missing something; and to procrastinate, I went on reading the earliest design documents of Lisp I could find. And so I came across the MIT AI team’s first ever memo, written by John McCarthy in September 1958. And in that, I find this:
+
+ 3.2.1. First we have those that extract parts of a 704 word and form a word from parts. We shall distinguish the following parts of a word and indicate each of them by a characteristic letter.
+
+
+
+ Letter |
+ Description |
+
+
+
+
+ w |
+ the whole word |
+
+
+ p |
+ the prefix (bits s, 1, 2) |
+
+
+ i |
+ the indicator (bits 1 and 2) |
+
+
+ s |
+ the sign bit |
+
+
+ d |
+ the decrement (bits 3-17) |
+
+
+ t |
+ the tag (bits 18-20) |
+
+
+ a |
+ the address (bits 21-35) |
+
+
+
+
+
In the discussion of functions which access properties on page 58 of the Lisp 1.5 programmer’s manual, the word ‘indicator’ is used in preference to ‘symbol’ for the name of a property: for example
+
+ The function deflist
is a more general defining function. Its first argument is a list of pairs as for define. Its second argument is the indicator that is to be used. After deflist
has been executed with (ui vi) among its first argument, the property list of ui will begin:
+ If deflist
or define
is used twice on the same object with the same indicator, the old value will be replaced by the new one.
+
+
(my emphasis).
+
That use of ‘indicator’ has been nagging at me for a week. It looks like a term of art. If it’s just an ordinary atomic symbol, why isn’t it called a symbol?
+
Is it an indicator in the special sense of the indicator part of the machine word? If it were, then the property list could just be a flat list of values. And what’s been worrying and surprising me is that property lists are shown in the manual as flat lists. Eureka? I don’t think so.
+
The reason I don’t think so is that there are only two bits in the indicator part of the word, so only four distinct values; whereas we know that Lisp 1.5 has (at least) five distinct indicator values, APVAL
, EXPR
, FEXPR
, SUBR
and FSUBR
.
+
Furthermore, on page 39, we have:
+
+ A property list is characterized by having the special constant 777778 (i. e., minus 1) as the first element of the list. The rest of the list contains various properties of the atomic symbol. Each property is preceded by an atomic symbol which is called its indicator.
+
+
(again, my emphasis)
+
But I’m going to hypothesise that the properties were originally intended to be discriminated by the indicator bits in the cons cell, that they were originally coded that way, and that there was some code which depended on property lists being flat lists; and that, when it was discovered that four indicators were not enough and that something else was going to have to be used, the new format of the property list using atomic symbols as indicators was bodged in.
+
So what this is about is I’ve spent most of a whole day procrastinating, because I’m not exactly sure how I’m going to make the change I’ve got to make. Versions of Beowulf up to and including 0.2.1 used the naive understanding of the architecture; version 0.3.0 should use the corrected version. But before it can, I need to be reasonably confident that I understand what the correct solution is.
-
I shall implement PUT
, even though it isn’t in the spec, because it’s a useful building block on which to build DEFINE
and DEFLIS
, both of which are. And also, because PUT
would have been very easy for the Lisp 1.5 implementers to implement, if it had been relevant to their working environment.