# grendel A reimplementation of [Beowulf](https://git.journeyman.cc/simon/beowulf) bootstrapped in C, with a compiler following, basically, [Abdulaziz Ghuloum's recipe](https://bernsteinbear.com/assets/img/11-ghuloum.pdf). ## Memory model It seems I obsess with how things are represented in memory. Although most of the people who build Ghuloum-style compilers treat memory as something of an afterthought, I'm starting with it. ### In the beginning was the Word My intention is that memory will be considered as an array of 64 bit words. Each word may be considered as 1. a cons cell: two instances of object32, each having one mark bit, three tag bits and 28 payload bits; 2. a single object64, having one mark bit, seven tag bits, and 56 payload bits. Note that, for any word, the first four bits comprise the mark and (part or all of) the tag, whether the cell is an `object64` or a cons of two `object32`s; for this reason, all `object64`s will have all of the first three bits of the tag set. So: ``` 3 3 6 0 1 3 4 8 1 2 3 +-+---+-----------------------------+-+---+----------------------------+ |M|tag| payload... |M|tag| payload... | +-+---+----+------------------------+-+---+----------------------------+ |M|111 tag | payload... | +-+--------+-----------------------------------------------------------+ where `M` represents `mark` ``` I've tried to do this with C `structs` but I've failed to get the bit fields to pack properly so I'm just going to be a barbarian and use bit masks and bit shifts. ### Tag! You're it! Tags will be allocated as follows: | 3-bit value | 7-bit value | (Hex) | Interpretation | | ----------- | ----------- | ----- | ------------------------------------------------------------ | | 0 | 0 | 0x0 | an error object, whose payload is a 3 character error code. | | 1 | 1 | 0x1 | a pointer; an offset into the vector of words. | | 2 | 2 | 0x2 | a signed 28 bit integer. | | 3 | 3 | 0x3 | a character; possibly just a byte, or possibly a 16 bit wchar. | | 4 | 4 | 0x4 | unassigned (possibly a floating point number, later.) | | 5 | 5 | 0x5 | unassigned | | 6 | 6 | 0x6 | unassigned | | 7 | 7 | 0x7 | **never** used: see [Recognising a cons cell](#Recognising-a-cons-cell), below | | 7 | 15 | 0xf | a symbol cell *(this implies a symbol can have only up to seven, or if compressed to five bits per character, eleven characters)* | | 7 | 23 | 0x17 | a pointer to a compiled function *(there's a problem here; it means we can only allocate a function in the lower 72,057,594,037,927,936 bytes of memory; I *think* that's not going to byte us on the bum, pun intended)*. | | 7 | 31 | 0x1f | a pointer to a compiled special form *(same problem as above)*. | | 7 | 39 | 0x27 | unassigned ? a ratio cell ? | | 7 | 47 | 0x2f | unassigned ? a big number ? | | 7 | 55 | 0x37 | unassigned ? a string ? | | 7 | 63 | | unassigned | | 7 | 71 | | unassigned | | 7 | 79 | | unassigned | | 7 | 87 | | unassigned | | 7 | 95 | | unassigned | | 7 | 103 | | unassigned | | 7 | 111 | | unassigned | | 7 | 119 | | unassigned | | 7 | 127 | 0x7f | a free cell | ### Recognising a cons cell My original idea was to have a specific tag to mean a cons cell, and that tag was going to be 7, binary 111, all three lower-most bits set. This does not work. If we were to do that, there is nowhere to put the tag of the `car` of the cell. So a cell is a cons cell if the value of the lower three bits of the tag is **less than** 7; all 64 bit objects other than cons cells will have all of the three lower-most bits of the tag set. ## Problems with building a Ghuloum-style compiler in Lisp 1.5 Ghuloum's compiler emits strings in the form of assembly language statements into a file which is then run through a separate assembler to produce a binary which is finally integrated with a launcher stub written in C using a linker. This makes it possible to write a Lisp largely in that Lisp itself (provided you have an existing Lisp fostermother image to run the initial compilation); but it does not dirctly enable you to compile a single function into the existing image at runtime, and then immediately use the newly compiled function; and as far as I'm concerned, until you have that you don't have a working Lisp compiler. Furthermore, Lisp 1.5 does not have a concept of a string, and cannot manipulate strings. I *could* add strings as an extension, but that feels somewhat outwith the scope of this project. I don't feel that any of this is insuperable. Lisp 1.5 supports the functions `PRINT`, `PRIN1`, and `TERPRI` which respectively make it possible to print complete S-expressions, individual atoms, and linefeeds. It would be possible to create a symbol whose print name was a single blank space. It would be perverse, and annoying, and `READ` would not recognise such a symbol so if it were included in a sysout file that sysout could not be read back in; but it would be possible. Or else, we could assemble the assembly language statements as a list of individual S-expressions, print that, run it through an external preprocessor and the assembler, and then link the resulting binary. It does not seem to me that using an external pre-processor is any more 'cheating' than using an external assembler. But finally, and this will be my preferred outcome, one could create that list of assembly language statements in memory; one could estimate from it the size of the compiled function; one could malloc a block of memory of that size on the heap; and one could then assemble the function by writing bytes into that block of memory as specified in the assembly language statements. If I'm going to use this side-project as an exercise to learn how to write the Post Scarcity compiler, the Post Scarcity compiler has got to be able to do that; so I should try.