81 lines
No EOL
7.1 KiB
Markdown
81 lines
No EOL
7.1 KiB
Markdown
# grendel
|
|
|
|
A reimplementation of [Beowulf](https://git.journeyman.cc/simon/beowulf) bootstrapped in C, with a compiler following, basically, [Abdulaziz Ghuloum's recipe](https://bernsteinbear.com/assets/img/11-ghuloum.pdf).
|
|
|
|
## Memory model
|
|
|
|
It seems I obsess with how things are represented in memory. Although most of the people who build Ghuloum-style compilers treat memory as something of an afterthought, I'm starting with it.
|
|
|
|
### In the beginning was the Word
|
|
|
|
My intention is that memory will be considered as an array of 64 bit words.
|
|
|
|
Each word may be considered as
|
|
|
|
1. a cons cell: two instances of object32, each having one mark bit, three tag bits and 28 payload bits;
|
|
2. a single object64, having one mark bit, seven tag bits, and 56 payload bits.
|
|
|
|
Note that, for any word, the first four bits comprise the mark and (part or all of) the tag, whether the cell is an `object64` or a cons of two `object32`s; for this reason, all `object64`s will have all of the first three bits of the tag set. So:
|
|
|
|
```
|
|
3 3 6
|
|
0 1 3 4 8 1 2 3
|
|
+-+---+-----------------------------+-+---+----------------------------+
|
|
|M|tag| payload... |M|tag| payload... |
|
|
+-+---+----+------------------------+-+---+----------------------------+
|
|
|M|111 tag | payload... |
|
|
+-+--------+-----------------------------------------------------------+
|
|
where `M` represents `mark`
|
|
```
|
|
|
|
I've tried to do this with C `structs` but I've failed to get the bit fields to pack properly so I'm just going to be a barbarian and use bit masks and bit shifts.
|
|
|
|
### Tag! You're it!
|
|
|
|
Tags will be allocated as follows:
|
|
|
|
| 3-bit value | 7-bit value | (Hex) | Interpretation |
|
|
| ----------- | ----------- | ----- | ------------------------------------------------------------ |
|
|
| 0 | 0 | 0x0 | an error object, whose payload is a 3 character error code. |
|
|
| 1 | 1 | 0x1 | a pointer; an offset into the vector of words. |
|
|
| 2 | 2 | 0x2 | a signed 28 bit integer. |
|
|
| 3 | 3 | 0x3 | a character; possibly just a byte, or possibly a 16 bit wchar. |
|
|
| 4 | 4 | 0x4 | unassigned (possibly a floating point number, later.) |
|
|
| 5 | 5 | 0x5 | unassigned |
|
|
| 6 | 6 | 0x6 | unassigned |
|
|
| 7 | 7 | 0x7 | **never** used: see [Recognising a cons cell](#Recognising-a-cons-cell), below |
|
|
| 7 | 15 | 0xf | a symbol cell *(this implies a symbol can have only up to seven, or if compressed to five bits per character, eleven characters)* |
|
|
| 7 | 23 | 0x17 | a pointer to a compiled function *(there's a problem here; it means we can only allocate a function in the lower 72,057,594,037,927,936 bytes of memory; I *think* that's not going to byte us on the bum, pun intended)*. |
|
|
| 7 | 31 | 0x1f | a pointer to a compiled special form *(same problem as above)*. |
|
|
| 7 | 39 | 0x27 | unassigned ? a ratio cell ? |
|
|
| 7 | 47 | 0x2f | unassigned ? a big number ? |
|
|
| 7 | 55 | 0x37 | unassigned ? a string ? |
|
|
| 7 | 63 | | unassigned |
|
|
| 7 | 71 | | unassigned |
|
|
| 7 | 79 | | unassigned |
|
|
| 7 | 87 | | unassigned |
|
|
| 7 | 95 | | unassigned |
|
|
| 7 | 103 | | unassigned |
|
|
| 7 | 111 | | unassigned |
|
|
| 7 | 119 | | unassigned |
|
|
| 7 | 127 | 0x7f | a free cell |
|
|
|
|
### Recognising a cons cell
|
|
|
|
My original idea was to have a specific tag to mean a cons cell, and that tag was going to be 7, binary 111, all three lower-most bits set.
|
|
|
|
This does not work. If we were to do that, there is nowhere to put the tag of the `car` of the cell. So a cell is a cons cell if the value of the lower three bits of the tag is **less than** 7; all 64 bit objects other than cons cells will have all of the three lower-most bits of the tag set.
|
|
|
|
## Problems with building a Ghuloum-style compiler in Lisp 1.5
|
|
|
|
Ghuloum's compiler emits strings in the form of assembly language statements into a file which is then run through a separate assembler to produce a binary which is finally integrated with a launcher stub written in C using a linker. This makes it possible to write a Lisp largely in that Lisp itself (provided you have an existing Lisp fostermother image to run the initial compilation); but it does not dirctly enable you to compile a single function into the existing image at runtime, and then immediately use the newly compiled function; and as far as I'm concerned, until you have that you don't have a working Lisp compiler.
|
|
|
|
Furthermore, Lisp 1.5 does not have a concept of a string, and cannot manipulate strings. I *could* add strings as an extension, but that feels somewhat outwith the scope of this project.
|
|
|
|
I don't feel that any of this is insuperable. Lisp 1.5 supports the functions `PRINT`, `PRIN1`, and `TERPRI` which respectively make it possible to print complete S-expressions, individual atoms, and linefeeds. It would be possible to create a symbol whose print name was a single blank space. It would be perverse, and annoying, and `READ` would not recognise such a symbol so if it were included in a sysout file that sysout could not be read back in; but it would be possible.
|
|
|
|
Or else, we could assemble the assembly language statements as a list of individual S-expressions, print that, run it through an external preprocessor and the assembler, and then link the resulting binary. It does not seem to me that using an external pre-processor is any more 'cheating' than using an external assembler.
|
|
|
|
But finally, and this will be my preferred outcome, one could create that list of assembly language statements in memory; one could estimate from it the size of the compiled function; one could malloc a block of memory of that size on the heap; and one could then assemble the function by writing bytes into that block of memory as specified in the assembly language statements.
|
|
|
|
If I'm going to use this side-project as an exercise to learn how to write the Post Scarcity compiler, the Post Scarcity compiler has got to be able to do that; so I should try. |