Reimplementation of Beowulf in C, with compiler
Find a file
2026-04-12 13:32:58 +01:00
munit@fbbdf1467e Day 1. Assembly code is being generated, on a monkey see, monkey do basis; 2026-04-06 09:48:44 +01:00
src The tiniest bit of actual progress. 2026-04-12 13:32:58 +01:00
.gitignore Day 1. Assembly code is being generated, on a monkey see, monkey do basis; 2026-04-06 09:48:44 +01:00
.gitmodules Day 1. Assembly code is being generated, on a monkey see, monkey do basis; 2026-04-06 09:48:44 +01:00
LICENSE Initial commit 2026-04-05 11:18:15 +00:00
Makefile Renamed the whole project from Grendel (Beowulf's opponent) to Naegling (Beowulf's sword). 2026-04-12 13:13:10 +01:00
README.md Renamed the whole project from Grendel (Beowulf's opponent) to Naegling (Beowulf's sword). 2026-04-12 13:13:10 +01:00

naegling

A compiler for Beowulf following, basically, Abdulaziz Ghuloum's recipe and Noah Zentzis' implementation thereof.

Memory model

It seems I obsess with how things are represented in memory. Although most of the people who build Ghuloum-style compilers treat memory as something of an afterthought, I'm starting with it.

In the beginning was the Word

My intention is that memory will be considered as an array of 64 bit words.

Each word may be considered as

  1. a cons cell: two instances of object32, each having one mark bit, three tag bits and 28 payload bits;
  2. a single object64, having one mark bit, seven tag bits, and 56 payload bits.

Note that, for any word, the first four bits comprise the mark and (part or all of) the tag, whether the cell is an object64 or a cons of two object32s; for this reason, all object64s will have all of the first three bits of the tag set. So:

                                   3 3                                6
 0 1 3 4    8                      1 2                                3
+-+---+-----------------------------+-+---+----------------------------+
|M|tag| payload...                  |M|tag| payload...                 |
+-+---+----+------------------------+-+---+----------------------------+
|M|111 tag | payload...                                                |
+-+--------+-----------------------------------------------------------+
where `M` represents `mark`

I've tried to do this with C structs but I've failed to get the bit fields to pack properly so I'm just going to be a barbarian and use bit masks and bit shifts.

Tag! You're it!

Tags will be allocated as follows:

3-bit value 7-bit value (Hex) Interpretation
0 0 0x0 an error object, whose payload is a 3 character error code.
1 1 0x1 a pointer; an offset into the vector of words.
2 2 0x2 a signed 28 bit integer.
3 3 0x3 a character; possibly just a byte, or possibly a 16 bit wchar.
4 4 0x4 unassigned (possibly a floating point number, later.)
5 5 0x5 unassigned
6 6 0x6 unassigned
7 7 0x7 never used: see Recognising a cons cell, below
7 15 0xf a symbol cell (this implies a symbol can have only up to seven, or if compressed to five bits per character, eleven characters)
7 23 0x17 a pointer to a compiled function (there's a problem here; it means we can only allocate a function in the lower 72,057,594,037,927,936 bytes of memory; I think that's not going to byte us on the bum, pun intended).
7 31 0x1f a pointer to a compiled special form (same problem as above).
7 39 0x27 unassigned ? a ratio cell ?
7 47 0x2f unassigned ? a big number ?
7 55 0x37 unassigned ? a string ?
7 63 unassigned
7 71 unassigned
7 79 unassigned
7 87 unassigned
7 95 unassigned
7 103 unassigned
7 111 unassigned
7 119 unassigned
7 127 0x7f a free cell

Recognising a cons cell

My original idea was to have a specific tag to mean a cons cell, and that tag was going to be 7, binary 111, all three lower-most bits set.

This does not work. If we were to do that, there is nowhere to put the tag of the car of the cell. So a cell is a cons cell if the value of the lower three bits of the tag is less than 7; all 64 bit objects other than cons cells will have all of the three lower-most bits of the tag set.

Problems with building a Ghuloum-style compiler in Lisp 1.5

Ghuloum's compiler emits strings in the form of assembly language statements into a file which is then run through a separate assembler to produce a binary which is finally integrated with a launcher stub written in C using a linker. This makes it possible to write a Lisp largely in that Lisp itself (provided you have an existing Lisp fostermother image to run the initial compilation); but it does not dirctly enable you to compile a single function into the existing image at runtime, and then immediately use the newly compiled function; and as far as I'm concerned, until you have that you don't have a working Lisp compiler.

Furthermore, Lisp 1.5 does not have a concept of a string, and cannot manipulate strings. I could add strings as an extension, but that feels somewhat outwith the scope of this project.

I don't feel that any of this is insuperable. Lisp 1.5 supports the functions PRINT, PRIN1, and TERPRI which respectively make it possible to print complete S-expressions, individual atoms, and linefeeds. It would be possible to create a symbol whose print name was a single blank space. It would be perverse, and annoying, and READ would not recognise such a symbol so if it were included in a sysout file that sysout could not be read back in; but it would be possible.

Or else, we could assemble the assembly language statements as a list of individual S-expressions, print that, run it through an external preprocessor and the assembler, and then link the resulting binary. It does not seem to me that using an external pre-processor is any more 'cheating' than using an external assembler.

But finally, and this will be my preferred outcome, one could create that list of assembly language statements in memory; one could estimate from it the size of the compiled function; one could malloc a block of memory of that size on the heap; and one could then assemble the function by writing bytes into that block of memory as specified in the assembly language statements.

If I'm going to use this side-project as an exercise to learn how to write the Post Scarcity compiler, the Post Scarcity compiler has got to be able to do that; so I should try.