Careful debugging of the memory leak problem. At this stage,

stack frames for interpreted (but not primitive) functions appear not to be being
reclaimed, and the oblist doesn't seem to be being fully reclaimed.
This commit is contained in:
Simon Brooke 2026-02-20 19:39:19 +00:00
parent 8629e33f92
commit 70376c6529
14 changed files with 156 additions and 50 deletions

View file

@ -1,5 +1,57 @@
# State of Play
## 20260220
### State of the build
The only unit tests that are failing now are the bignum tests, which I have
consciously parked as a future problem, and the memory leak, similarly. The
leak is a lot less bad than it was, but I'm worried that stack frames
are not being freed.
If you run
```
cat lisp/fact.lisp | target/psse -d 2>&1 |\
grep 'Vector space object of type' | sort | uniq -c | sort -rn
```
you get a huge number (currently 394) of stack frames in the memory dump; they
should all have been reclaimed. There's other stuff in the memory dump as well,
```
422 CONS ;; cons cells, obviously
394 VECP ;; pointers to vector space objects -- specifically, the stack frames
335 SYMB ;; symbols
149 INTR ;; integers
83 STRG ;; strings
46 FUNC ;; primitive (i.e. written in C) functions
25 KEYW ;; keywords
10 SPFM ;; primitive special forms
3 WRIT ;; write streams: `*out*`, `*log*`, `*sink*`
1 TRUE ;; t
1 READ ;; read stream: `*in*`
1 NIL ;; nil
1 LMDA ;; lambda function, specifically `fact`
```
Generally, for each character in a string, symbol or keyword there will be one
cell (`STRG`, `SYMB`, or `KEYW`) cell, so the high number of STRG cells is not
especially surprising. It looks as though none of the symbols bound in the
oblist are being recovered on exit, which is undesirable but not catastrophic,
since it's a fixed burden of memory which isn't expanding.
But the fact that stack frames aren't being reclaimed is serious.
### Update, 19:31
Right, investigating this more deeply, I found that `make_empty_frame` was doing
an `inc_ref` it should not have been, Having fixed that I'm down to 27 frames
left in the dump. That's very close to the number which will be generated by
running `(fact 25)`, so I expect it is now only stack frames for interpreted
functions which are not being reclaimed. This give me something to work on!
## 20260215
Both of yesterday's regressions are fixed. Memory problem still in much the
@ -14,8 +66,8 @@ It burned through 74 cons pages each of 1,024 cons cells, total 76,800 cells,
and 19,153 stack frames. before it got there; and then threw the exception back
up through each of those 19,153 stack frames. But the actual exception message
was `Unrecognised tag value 0 ( )`, which is not enormously helpful.
However, once I had recognised what the problem was, it was quickly fixed, with
S
However, once I had recognised what the problem was, it was quickly fSixed, with
the added bonus that the new solution will automatically work for bignum
fractions once bignums are working.