Print is less badly broken. Read is less badly broken. GC is too aggressive.

This commit is contained in:
Simon Brooke 2026-04-24 21:20:23 +01:00
parent 22b0160a26
commit 63906fe817
19 changed files with 489 additions and 303 deletions

View file

@ -1,5 +1,68 @@
# State of Play
## 20260424
### To have `c_` functions or not to have `c_` functions, revisited
Right, I was hugely pleased with my 'make everything a Lisp, function, and then call it from C' idea. I wrote things like:
```c
print( make_frame( 2, base_of_stack,
eval( make_frame( 1, base_of_stack,
read( make_frame( 1, base_of_stack, input_stream ) ) ) ),
output_stream ) );
```
Isn't it beautiful? Isn't it elegant? Isn't it clear? Yes, it is. Does it work? Yes, actually, it does. Is it a total crock? Unfortunately, dear reader, it is. In this pattern, we don't have a handle on any of the stack frames made with make_frame, so we can't `dec_ref` them, so they don't get garbage collected. And while during bootstrap it's inevitable that there's a little crud left over because it was created before we have enough infrastructure set up, what I'm seeing at present from a 'start up and shut down run' is
| Size class | Allocated | Deallocated | Remaining |
| ------------ | ------------ | ------------ | ------------ |
| 2 | 453 | 1 | 452 |
| 3 | 1 | 0 | 1 |
| 4 | 49 | 4 | 45 |
| 5 | 0 | 0 | 0 |
| 6 | 0 | 0 | 0 |
The 452 unfreed objects in size class two are cons cells and string fragments, and they mostly represent the metadata on the streams `*in*`, `*out*`, `*log*` and `*sink*`, all of which are deliberately protected from garbage collection because, frankly, you don't want those things going away under you; so that's kind of OK. The one in size class three is an exception, and I'm quite pleased I'm only throwing one exception during bootstrap (although it would be nice it it got cleaned up).
But the 45 unfreed objects in size class four are stackframes, and the reason they're unfreed is the coding pattern you see above.
So, how to get around this?
The code snippet above could be rewritten:
```c
struct pso_pointer next = inc_ref( make_frame(1, base_of_stack, input_stream));
struct pso_pointer read_value = inc_ref(read(next));
dec_ref( next);
next = inc_ref( make_frame(1, base_of_stack, read_value));
struct pso_pointer eval_value = inc_ref( eval( next));
dec_ref( next);
dec_ref( read_value);
next = inc_ref( make_frame(2, base_of_stack, eval_value, output_stream));
print( next);
dec_ref( next);
dec_ref( eval_value);
```
This is much more prolix and, to me, less elegant; but it does get the garbage collected. In each stanza we're first setting up a frame with the arguments for the function we're about to call, then calling that function with the frame we've set up, and then `dec_ref`ing the frame. We shouldn't need to `dec_ref` the value returned by `print`, since we don't use it and the only thing holding a reference to it is the frame in which it was created, which we do `dec_ref`.
I could `dec_ref` `read_value`, for instance, as soon as I've put it into the frame for `eval` rather than after `eval` has actually been invoked, since the frame is now protecting it from garbage collection; but I've delayed doing so until afterwards out of caution.
Once we have `eval`/`apply` working, we won't need to do all this bureaucratic incrementing and decrementing of reference counts explicitly, since `eval`/`apply` *should* take care of it automatically.
I'm still not 100% confident I can make the reference counting garbage collector work reliably, irrespective of whether it's actually efficient.
### To recode or not to recode?
There are 55 calls to `make_frame` in existing C code, and they're almost all written in the 'elegant but insanitary' pattern. Could they be rewritten more cleanly? Yes, they could. But my hope is most of this code will be replaced with code written in Lisp, once we have Lisp sufficiently bootstrapped to make that possible.
So I think I'm going to put up with the uncollected garbage until we get to that point, at which point I'll audit the C code to see what is actually still in use, sanitise that, and delete the rest.
However, any new C code (and there is going to have to be some) *must* be written in the sanitary but bureaucratic pattern.
## 20260421
### To have `c_` functions or not to have `c_` functions?