diff --git a/docs/Access-control.md b/docs/Access-control.md index 0f41a5d..07e4851 100644 --- a/docs/Access-control.md +++ b/docs/Access-control.md @@ -2,9 +2,9 @@ _ ote that a number of details not yet finalised are used in examples in this note. There must be some mechanism for creating fully qualified and partially qualified hierarchical names, but I haven't finalised it yet. In this note I've assumed that the portions of an hierarchical name are separated by periods ('.'); that fully qualified names start with a quote mark; and that where a name doesn't start with a quote mark, the first portion of it is evaluated in the current environment and its value assumed to be a fully qualified equivalent. All of these details may change._ -In a multi-user environment, access control is necessary in order for a user to be able to protect an item of data from being seen by someone who isn't authorised to see it. But actually, in a world of immutable data, it's less necessary than you might think. As explained in my note on [[Memory, threads and communication]], if there's strict immutability, and all user processes spawn from a common root process, then no user can see into any other user's data space anyway. +In a multi-user environment, access control is necessary in order for a user to be able to protect an item of data from being seen by someone who isn't authorised to see it. But actually, in a world of immutable data, it's less necessary than you might think. As explained in my note on [Memory, threads and communication](https://www.journeyman.cc/blog/posts-output/2017-01-08-post-scarcity-memory-threads-and-communication/), if there's strict immutability, and all user processes spawn from a common root process, then no user can see into any other user's data space anyway. -But that makes collaboration and communication impossible, so I've proposed namespaces be mutable. So the value of a name in a [[namespace]] will be a data item and inevitably that data item will be in some user's data space. So we do need an access control list on each data item. +But that makes collaboration and communication impossible, so I've proposed namespaces be mutable. So the value of a name in a [namespace](Namespace.html) will be a data item and inevitably that data item will be in some user's data space. So we do need an access control list on each data item. ## Initial thoughts @@ -24,7 +24,7 @@ As most data is immutable, there's no need for write access lists. If it exists, A sort-of minor exception to this is write streams. If you have normal access to a write stream, gatekept by the normal access lists, you can write to the stream; what you can't do is change where the stream points to. As you can't read from a write stream, there's still only one access list needed. -However, if (some) [[namespaces]] are mutable - and I believe some must be - then a namespace does need a write access list, in addition to its (normal) read access list. The structure of a write access list will be the same as of a read access list. +However, if (some) [namespaces](Namespace.html) are mutable - and I believe some must be - then a namespace does need a write access list, in addition to its (normal) read access list. The structure of a write access list will be the same as of a read access list. ### Modifying write access lists on mutable namespaces diff --git a/docs/Cons-space.md b/docs/Cons-space.md index 9e1005f..d954d06 100644 --- a/docs/Cons-space.md +++ b/docs/Cons-space.md @@ -36,11 +36,11 @@ A mark and sweep garbage collector actually only needs one mark bit, but for now ### Access control -Access control is a [[cons pointer]], see below; and is consequently the size of a cons pointer, which is presently 64 bits. An access control value of NIL means only system processes may access the cell; an access control value of TRUE means any user can access the cell; otherwise, the access control pointer points to the first cons cell of a list of allowed users/groups. The access control list is thus an ordinary list in ordinary cons space, and cells in an access control list can have access control lists of their own. As cons cells are immutable, infinite recursion is impossible; but it is nevertheless probably a good thing if access control list cells normally have an access control list of either TRUE or NIL. +Access control is a [cons pointer](cons pointer.html), see below; and is consequently the size of a cons pointer, which is presently 64 bits. An access control value of NIL means only system processes may access the cell; an access control value of TRUE means any user can access the cell; otherwise, the access control pointer points to the first cons cell of a list of allowed users/groups. The access control list is thus an ordinary list in ordinary cons space, and cells in an access control list can have access control lists of their own. As cons cells are immutable, infinite recursion is impossible; but it is nevertheless probably a good thing if access control list cells normally have an access control list of either TRUE or NIL. ### Car, Cdr: Cons pointers -A [[cons pointer]] is simply a pointer to a cons cell, and the simplest way to implement this is exactly as the memory address of the cons cell. +A [cons pointer](cons pointer.html) is simply a pointer to a cons cell, and the simplest way to implement this is exactly as the memory address of the cons cell. We have a fixed size vector of total memory, which we address in eight bit words (bytes) because that's the current convention. Our cons cell size is 32 bytes. So 31/32 of the possible values of a cons pointer are wasted - there cannot be a valid cons cell at that address. Also, our total memory must be divided between cons space, vector space and stack (actually stack could be implemented in either cons space or vector space, and ultimately may end up being implemented in cons space, but that's a highly non-trivial detail which will be addressed much later). In practice it's likely that less than half of the total memory available will be devoted to cons space. So 63/64 of the possible values of a cons pointer are wasted. @@ -50,7 +50,7 @@ One of the things I absolutely hate about modern computers is their tendency to That was acceptable when the JVM was a special purpose platform for developing software for small embedded devices, which is what it was originally designed for. But it's one of the compromises the JVM makes in order to work well on small embedded devices which is completely unacceptable for post-scarcity computing. And we won't accept it. -But be that as it may, we don't know at system initialisation time how much memory to reserve for cons space, and how much for vector space ('the heap'). If we reserve too much for cons space, we may run out of vector space while there's still cons space free, and vice versa. So we'll reserve cons space in units: [[cons pages]]. If our cons pointers are absolute memory addresses, then it becomes very expensive to move a cons page in memory, because all the pointers in the whole system to any cell on the page need to be updated. +But be that as it may, we don't know at system initialisation time how much memory to reserve for cons space, and how much for vector space ('the heap'). If we reserve too much for cons space, we may run out of vector space while there's still cons space free, and vice versa. So we'll reserve cons space in units: [cons pages](cons pages.html). If our cons pointers are absolute memory addresses, then it becomes very expensive to move a cons page in memory, because all the pointers in the whole system to any cell on the page need to be updated. (**NOTE**: As my thinking has developed, I'm now envisaging one cons page per compute node, which means that on each node the division between cons space and vector space will have to be fixed) @@ -79,13 +79,13 @@ A cons cell. The tag value of a CONS cell is that unsigned 32 bit integer which, ### FREE -An unassigned cons cell. The tag value of a FREE cell is that unsigned 32 bit integer which, when considered as an ASCII string, reads 'FREE'. The count of a FREE cell is always zero. The mark of a free cell is always zero. The access control value of a FREE cell is always NIL. The Car of a FREE cell is always NIL (address zero). The Cdr of a FREE cell is a cons-pointer to the next FREE cell (the [[free list]] pointer). +An unassigned cons cell. The tag value of a FREE cell is that unsigned 32 bit integer which, when considered as an ASCII string, reads 'FREE'. The count of a FREE cell is always zero. The mark of a free cell is always zero. The access control value of a FREE cell is always NIL. The Car of a FREE cell is always NIL (address zero). The Cdr of a FREE cell is a cons-pointer to the next FREE cell (the [free list](free list.html) pointer). ### INTR An integer; possibly an integer which isn't a big integer. The tag value of a INTR cell is that unsigned 32 bit integer which, when considered as an ASCII string, reads 'INTR'. The count of a INTR cell is always non-zero. The mark is up to the garbage collector. -There's fundamentally two ways to do this; one is we store up to 128 bit signed integers in the payload of an INTR cell, and have some other tag for an integer ('[[bignum]]') which overflows 128 bits and must thus be stored in another data structure; or else we treat one bit as a 'bignum' flag. If the bignum flag is clear we treat the remaining 127 bits as an unsigned 127 bit integer; if set, we treat the low 64 bits of the value as a cons pointer to the data structure which represents the bignum. +There's fundamentally two ways to do this; one is we store up to 128 bit signed integers in the payload of an INTR cell, and have some other tag for an integer ('[bignum](bignum.html)') which overflows 128 bits and must thus be stored in another data structure; or else we treat one bit as a 'bignum' flag. If the bignum flag is clear we treat the remaining 127 bits as an unsigned 127 bit integer; if set, we treat the low 64 bits of the value as a cons pointer to the data structure which represents the bignum. ### NIL @@ -105,13 +105,13 @@ A real number. The tag value of a REAL cell is that unsigned 32 bit integer whic A string. The tag value of a STRG cell is that unsigned 32 bit integer which, when considered as an ASCII string, reads 'STRG'. The count of a STRG cell is always non-zero. The mark is up to the garbage collector. The Car of an STRG cell contains a single UTF character. The Cdr of an STRG cell contains a cons-pointer to the remainder of the string, or NIL if this is the end of the string. -Note that in this definition a string is not an atom, which is probably right. But we also at this stage don't have an idea of a [[symbol]]. Very likely we'll end up with the idea that a string which is bound to a value in a namespace is for our purposes a symbol. +Note that in this definition a string is not an atom, which is probably right. But we also at this stage don't have an idea of a [symbol](Interning-strings.html). Very likely we'll end up with the idea that a string which is bound to a value in a namespace is for our purposes a symbol. Note, however, that there's a risk that we might have two instances of strings comprising identical characters in identical order, one of which was bound in a namespace and one of which wasn't; string equality is something to worry about. ### TIME -At nanosecond resolution (if I've done my arithmetic right), 128 bits will represent a span of 1 x 10²² years, or much longer than from the big bang to the [estimated date of fuel exhaustion of all stars](https://en.wikipedia.org/wiki/Timeline_of_the_far_future). So I think I'll arbitrarily set an epoch 14Bn years before the UNIX epoch and go with that. The time will be unsigned - there is no time before the big bang. +At nanosecond resolution (if I've done my arithmetic right), 128 bits will represent a span of 1 x 1022 years, or much longer than from the big bang to the [estimated date of fuel exhaustion of all stars](https://en.wikipedia.org/wiki/Timeline_of_the_far_future). So I think I'll arbitrarily set an epoch 14Bn years before the UNIX epoch and go with that. The time will be unsigned - there is no time before the big bang. ### TRUE @@ -121,7 +121,7 @@ The canonical true value. May not actually exist at all: the cell-pointer whose A pointer into vector space. The tag value of a VECP cell is that unsigned 32 bit integer which, when considered as an ASCII string, reads 'VECP'. The count of a VECP cell is always non-zero. The mark is up to the garbage collector. The payload is the a pointer to a vector space object. On systems with an address bus up to 128 bits wide, it's simply the address of the vector; on systems with an address bus wider than 128 bits, it's probably an offset into an indirection table, but that really is a problem for another day. -As an alternate implementation on hardware with a 64 bit address bus, it might be sensible to have the Car of the VECP cell simply the memory address of the vector, and the Cdr a pointer to the next VECP cell, maintained automatically in the same way that a [[free list]] is maintained. This way we automatically hold a list of all live vector space objects, which would help in garbage collecting vector space. +As an alternate implementation on hardware with a 64 bit address bus, it might be sensible to have the Car of the VECP cell simply the memory address of the vector, and the Cdr a pointer to the next VECP cell, maintained automatically in the same way that a [free list](Free-list.html) is maintained. This way we automatically hold a list of all live vector space objects, which would help in garbage collecting vector space. Every object in vector space shall have exactly one VECP cell in cons space which refers to it. Every other object which wished to hold a reference to that object shall hold a cons pointer to VECP cell that points to the object. Each object in vector space shall hold a backpointer to the VECP cell which points to it. This means that if vector space needs to be shuffled in order to free memory, for each object which is moved only one pointer need be updated. @@ -136,4 +136,4 @@ I'm not yet certain what the payload of a WRIT cell is; it is implementation dep ## Cons pages -Cons cells will be initialised in cons pages. A cons page is a fixed size array of cons cells. Each cell is initialised as FREE, and each cell, as it is initialised, is linked onto the front of the system [[free list]]. Cons pages will exist in [[vector space]], and consequently each cons page will have a vector space header. \ No newline at end of file +Cons cells will be initialised in cons pages. A cons page is a fixed size array of cons cells. Each cell is initialised as FREE, and each cell, as it is initialised, is linked onto the front of the system [free list](Free-list.html). Cons pages will exist in [vector space](Vector-space.html), and consequently each cons page will have a vector space header. \ No newline at end of file diff --git a/docs/Core-functions.md b/docs/Core-functions.md index 772fd32..7f3cea5 100644 --- a/docs/Core-functions.md +++ b/docs/Core-functions.md @@ -1,6 +1,6 @@ # Core functions -In the specifications that follow, a word in all upper case refers to a tag value, defined on either the [[cons space]] or the [[vector space]] page. +In the specifications that follow, a word in all upper case refers to a tag value, defined on either the [cons space](Cons-space.html) or the [vector space](Vector-space.html) page. # (and args...) @@ -16,7 +16,7 @@ Public. Takes an arbitrary number of arguments, which should either all be CONS # (assoc key store) -Public. Takes two arguments, a key and a store. The store may either be a CONS forming the head of a list formatted as an [[assoc list]], or else a VECP pointing to a HASH. If the key is readable by the current user, returns the value associated with that key in the store, if it exists and is readable by the current user, else NIL. +Public. Takes two arguments, a key and a store. The store may either be a CONS forming the head of a list formatted as an [assoc list](Assoc-list.html), or else a VECP pointing to a HASH. If the key is readable by the current user, returns the value associated with that key in the store, if it exists and is readable by the current user, else NIL. # (car arg) @@ -57,7 +57,7 @@ _Note: I'm not sure what happens if the STRG is already bound in the HASH. A nor # (lambda args forms...) -Public. Takes an arbitrary number of arguments. Considers the first argument ('args') as a set of formal parameters, and returns a function composed of the forms with those parameters bound. Where I say 'returns a function', this is in initial prototyping probably an interpreted function (i.e. a code tree implemented as an S-expression), but in a usable version will mean a VECP (see [[cons space#VECP]]) pointing to an EXEC (see [[vector space#EXEC]]) vector. +Public. Takes an arbitrary number of arguments. Considers the first argument ('args') as a set of formal parameters, and returns a function composed of the forms with those parameters bound. Where I say 'returns a function', this is in initial prototyping probably an interpreted function (i.e. a code tree implemented as an S-expression), but in a usable version will mean a VECP (see [cons space](Cons-space.html#VECP)) pointing to an EXEC (see [vector space#EXEC](Vector-space.html#EXEC)) vector. # (nil? arg) @@ -89,5 +89,5 @@ Public. Takes one argument. If that argument is either an STRG or a READ, parses # (type arg) -Public. Takes one argument. If that argument is readable by the current user, returns a string interned in the *core.types* namespace representing the tag value of the argument, unless the argument is a VECP in which case the value returned represents the tag value of the [[vector space]] object indicated by the VECP. +Public. Takes one argument. If that argument is readable by the current user, returns a string interned in the *core.types* namespace representing the tag value of the argument, unless the argument is a VECP in which case the value returned represents the tag value of the [vector space](Vector-space.html) object indicated by the VECP. diff --git a/docs/Free-list.md b/docs/Free-list.md index 2b1fc13..645983b 100644 --- a/docs/Free-list.md +++ b/docs/Free-list.md @@ -1,4 +1,5 @@ +# Free list A free list is a list of FREE cells consed together. When a cell is deallocated, it is consed onto the front of the free list, and the system free-list pointer is updated to point to it. A cell is allocated by popping the front cell off the free list. diff --git a/docs/Hashing-structure-writ-large.md b/docs/Hashing-structure-writ-large.md index 05e698a..fb59df7 100644 --- a/docs/Hashing-structure-writ-large.md +++ b/docs/Hashing-structure-writ-large.md @@ -1,3 +1,5 @@ +# Hashing structure writ large + In Lisp, there's an expectation that any object may act as a key in a hash table. What that means, in practice, is that if a list ```lisp diff --git a/docs/Home.md b/docs/Home.md index 9efd688..e045ef1 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -1,6 +1,6 @@ # Post Scarcity Software Environment: general documentation -Work towards the implementation of a software system like that described in [Post Scarcity Software](http://blog.journeyman.cc/2006/02/post-scarcity-software.html). +Work towards the implementation of a software system like that described in [Post Scarcity Software](https://www.journeyman.cc/blog/posts-output/2006-02-20-postscarcity-software/). ## Note on canonicity @@ -26,5 +26,5 @@ When Linus Torvalds sat down in his bedroom to write Linux, he had something usa ## AWFUL WARNING 2 -This project is necessarily experimental and exploratory. I write code, it reveals new problems, I think about them, and I mutate the design. The documentation in this wiki does not always keep up with the developing source code. +This project is necessarily experimental and exploratory. I write code, it reveals new problems, I think about them, and I mutate the design. This documentation does not always keep up with the developing source code. diff --git a/docs/Homogeneity.md b/docs/Homogeneity.md index 064585a..6349406 100644 --- a/docs/Homogeneity.md +++ b/docs/Homogeneity.md @@ -1,4 +1,6 @@ -A homogeneity is a [[regularity]] which has a validation funtion associated with each key. A member can only be added to a homogeneity if not only does it have all the required keys, but the value of each key in the candidate member satisfies the validation function for that key. For example, the validation function for the age of a person might be something like +# Homogeneity + +A homogeneity is a [regularity](Regularity.html) which has a validation funtion associated with each key. A member can only be added to a homogeneity if not only does it have all the required keys, but the value of each key in the candidate member satisfies the validation function for that key. For example, the validation function for the age of a person might be something like ``` (fn [value] diff --git a/docs/Interning-strings.md b/docs/Interning-strings.md index c03516d..b92ded5 100644 --- a/docs/Interning-strings.md +++ b/docs/Interning-strings.md @@ -12,7 +12,7 @@ causes an unbound variable exception to be thrown, while returns the value **"froboz"**. This begs the question of whether there's any difference between **"froboz"** and **'froboz**, and the answer is that at this point I don't know. -There will be a concept of a root [[namespace]], in which other namespaces may be bound recursively to form a directed graph. Because at least some namespaces are mutable, the graph is not necessarily acyclic. There will be a concept of a current namespace, that is to say the namespace in which the user is currently working. +There will be a concept of a root [namespace](Namespace.html), in which other namespaces may be bound recursively to form a directed graph. Because at least some namespaces are mutable, the graph is not necessarily acyclic. There will be a concept of a current namespace, that is to say the namespace in which the user is currently working. There must be some notation to say distinguish a request for the value of a name in the root namespace and the value of a name in the current namespace. For now I'm proposing that: @@ -34,7 +34,7 @@ will return the value that **froboz** is bound to in the environment of the user The exact path separator syntax may change, but the principal that when interning a symbol it is broken down into a path of tokens, and that the value of each token is sought in a namespace bound to the previous token, is likely to remain. -Obviously if **froboz** is interned in one namespace it is not necessarily interned in another, and vice versa. There's a potentially nasty problem here that two lexically identical strings might be bound in different namespaces, so that there is not one canonical interned **froboz**; if this turns out to cause problems in practice there will need to be a separate canonical [[hashtable]] of individual path elements. +Obviously if **froboz** is interned in one namespace it is not necessarily interned in another, and vice versa. There's a potentially nasty problem here that two lexically identical strings might be bound in different namespaces, so that there is not one canonical interned **froboz**; if this turns out to cause problems in practice there will need to be a separate canonical [hashtable](Hashtable.html) of individual path elements. Obviously this means there may be arbitrarily many paths which reference the same data item. This is intended. @@ -46,11 +46,11 @@ Binds *string*, considered as a path, to **NIL**. If some namespace along the pa ### (intern! string T) -Binds *string*, considered as a path, to **NIL**. If some namespace along the path doesn't exist, create it as the current user with both read and write [[access control]] lists taken from the current binding of **friends** in the current environment. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. +Binds *string*, considered as a path, to **NIL**. If some namespace along the path doesn't exist, create it as the current user with both read and write [access control](Access-control.html) lists taken from the current binding of **friends** in the current environment. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. ### (intern! string T write-access-list) -Binds *string*, considered as a path, to **NIL**. If some namespace along the path doesn't exist, create it as the current user with the read [[access control]] list taken from the current binding of **friends** in the current environment, and the write access control list taken from the value of *write-access-list*. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. +Binds *string*, considered as a path, to **NIL**. If some namespace along the path doesn't exist, create it as the current user with the read [access control](https://www.journeyman.cc/blog/posts-output/2006-02-20-postscarcity-software/) list taken from the current binding of **friends** in the current environment, and the write access control list taken from the value of *write-access-list*. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. ### (set! string value) @@ -58,11 +58,11 @@ Binds *string*, considered as a path, to *value*. If some namespace along the pa ### (set! string value T) -Binds *string*, considered as a path, to *value*. If some namespace along the path doesn't exist, create it as the current user with both read and write [[access control]] lists taken from the current binding of **friends** in the current environment. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. +Binds *string*, considered as a path, to *value*. If some namespace along the path doesn't exist, create it as the current user with both read and write [access control](Access-control.html) lists taken from the current binding of **friends** in the current environment. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. ### (set! string value T write-access-list) -Binds *string*, considered as a path, to *value*. If some namespace along the path doesn't exist, create it as the current user with the read [[access control]] list taken from the current binding of **friends** in the current environment, and the write access control list taken from the value of *write-access-list*. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. +Binds *string*, considered as a path, to *value*. If some namespace along the path doesn't exist, create it as the current user with the read [access control](Access-control.html) list taken from the current binding of **friends** in the current environment, and the write access control list taken from the value of *write-access-list*. Obviously if the current user is not entitled to write to the last pre-existing namespace, throws an exception. ### (put! string token value) diff --git a/docs/Lazy-Collections.md b/docs/Lazy-Collections.md index 22a7f5b..56d2725 100644 --- a/docs/Lazy-Collections.md +++ b/docs/Lazy-Collections.md @@ -22,7 +22,7 @@ I acknowledge that, given that keywords and symbols are also sequences of charac ## How do we compute with lazy sequences in practice? -Consider the note [[parallelism]]. Briefly, this proposes that a compile time judgement is made at the probable cost of evaluating each argument; that the one deemed most expensive to evaluate is reserved to be evaluated on the local node, and for the rest, a judgement is made as to whether it would be cheaper to hand them off to peers or to evaluate them locally. Well, for functions which return lazies –– and the compiler should certainly be able to infer whether a function will return a lazy — it will always make sense to hand them off, if there is an available idle peer to which to hand off. In fact, lazy-producers are probably the most beneficial class of function calls to hand off, since, if handed off to a peer, the output of the function can be consumed without any fancy scheduling on the local node. Indeed, if all lazy-producers can be reliably handed off, we probably don't need a scheduler at all. +Consider the note [parallelism](Parallelism.html). Briefly, this proposes that a compile time judgement is made at the probable cost of evaluating each argument; that the one deemed most expensive to evaluate is reserved to be evaluated on the local node, and for the rest, a judgement is made as to whether it would be cheaper to hand them off to peers or to evaluate them locally. Well, for functions which return lazies –– and the compiler should certainly be able to infer whether a function will return a lazy — it will always make sense to hand them off, if there is an available idle peer to which to hand off. In fact, lazy-producers are probably the most beneficial class of function calls to hand off, since, if handed off to a peer, the output of the function can be consumed without any fancy scheduling on the local node. Indeed, if all lazy-producers can be reliably handed off, we probably don't need a scheduler at all. ## How do lazy sequences actually work? diff --git a/docs/Memory-management.md b/docs/Memory-management.md index c8da27b..8eb3726 100644 --- a/docs/Memory-management.md +++ b/docs/Memory-management.md @@ -15,15 +15,15 @@ I became interested in reference counting garbage collectors, because it seemed ## Separating cons space from vector space -Lisps generate lots and lots of very small, equal sized objects: cons cells and other things which are either the same size as or even smaller than cons cells and which fit into the same memory footprint. Furthermore, most of the volatility is in cons cells - they are often extremely short lived. Larger objects are allocated much more infrequently and tend to live considerably longer. +Lisps generate lots and lots of very small, equal sized objects: cons cells and other things which are either the same size as or even smaller than cons cells and which fit into the same memory footprint. Furthermore, most of the volatility is in cons cells — they are often extremely short lived. Larger objects are allocated much more infrequently and tend to live considerably longer. -Because cons cells are all the same size, and because integers and doubles fit into the memory footprint of a cons cell, if we maintain an array of memory units of this size then we can allocate them very efficiently because we never have to move them - we can always allocate a new object in memory vacated by deallocating an old one. Deallocation is simply a matter of pushing the deallocated cell onto the front of the free list; allocation is simply a matter of popping a cell off the free list. +Because cons cells are all the same size, and because integers and doubles fit into the memory footprint of a cons cell, if we maintain an array of memory units of this size then we can allocate them very efficiently because we never have to move them — we can always allocate a new object in memory vacated by deallocating an old one. Deallocation is simply a matter of pushing the deallocated cell onto the front of the free list; allocation is simply a matter of popping a cell off the free list. By contrast, a conventional software heap fragments exactly because we allocate variable sized objects into it. When an object is deallocated, it leaves a hole in the heap, into which we can only allocate objects of the same size or smaller. And because objects are heterogeneously sized, it's probable that the next object we get to allocate in it will be smaller, leaving even smaller unused holes. -Consequently we end up with a memory like a swiss cheese - by no means fully occupied, but with holes which are too small to fit anything useful in. In order to make memory in this state useful, you have to mark and sweep it. +Consequently we end up with a memory like a swiss cheese — by no means fully occupied, but with holes which are too small to fit anything useful in. In order to make memory in this state useful, you have to mark and sweep it. -So my first observation is that [[cons space]] and what I call [[vector space]] - that is, the heap into which objects which won't fit into the memory footprint of a cons cell are allocated - are systematically different and require different garbage collection strategies. +So my first observation is that [cons space](Cons-space.html) and what I call [vector space](Vector-space.html) — that is, the heap into which objects which won't fit into the memory footprint of a cons cell are allocated — are systematically different and require different garbage collection strategies. ## Reference counting: the objections @@ -39,7 +39,7 @@ The other 'fault' of older reference counting Lisps is that in older Lisps, cons So badly designed programs on reference counting Lisps could leak memory badly and consequently silt up and run out of allocatable store. -But modern Lisps - like Clojure - use immutable data structures. The nature of immutable data structures is that an older node can never point to a newer node. So circular data structures cannot be constructed. +But modern Lisps — like Clojure — use immutable data structures. The nature of immutable data structures is that an older node can never point to a newer node. So circular data structures cannot be constructed. ### Performance diff --git a/docs/Names-of-things.md b/docs/Names-of-things.md index 7ac9ab2..0269f43 100644 --- a/docs/Names-of-things.md +++ b/docs/Names-of-things.md @@ -1,10 +1,10 @@ * **assoc list** An assoc list is any list all of whose elements are cons-cells. * **association** Anything which associates names with values. An *assoc list* is an association, but so it a *map*, a *namespace*, a *regularity* and a *homogeneity*. -* **homogeneity** A [[homogeneity]] is a *regularity* which has a validation funtion associated with each key. -* **keyword** A [[keyword]] is a token whose denotation starts with a colon and which has a limited range of allowed characters, not including punctuation or spaces, which evaluates to itself irrespective of the current binding environment. +* **homogeneity** A [homogeneity](Homogeneity.html) is a *regularity* which has a validation funtion associated with each key. +* **keyword** A [keyword](Keyword.html) is a token whose denotation starts with a colon and which has a limited range of allowed characters, not including punctuation or spaces, which evaluates to itself irrespective of the current binding environment. * **map** A map in the sense of a Clojure map; immutable, adding a name/value results in a new map being created. A map may be treated as a function on *keywords*, exactly as in Clojure. * **namespace** A namespace is a mutable map. Generally, if a namespace is shared, there will be a path from the oblist to that namespace. * **oblist** The oblist is a privileged namespace which forms the root of all canonical paths. It is accessed at present by the function `(oblist)`, but it can be denoted in paths by the empty keyword. -* **path** A [[path]] is a list of keywords, with special notation and semantics. -* **regularity** A [[regularity]] is a map whose values are maps, all of whose members share the same keys. A map may be added to a regularity only if it has all the keys the regularity expects, although it may optionally have more. It is legitimate for the same map to be a member of two different regularities, if it has a union of their keys. Keys in a regularity must be keywords. Regularities are roughly the same sort of thing as objects in object oriented programming or tables in databases, but the values of the keys are not policed (see `homogeneity`). +* **path** A [path](How-do-we-notate-paths.html) is a list of keywords, with special notation and semantics. +* **regularity** A [regularity](Regularity.html) is a map whose values are maps, all of whose members share the same keys. A map may be added to a regularity only if it has all the keys the regularity expects, although it may optionally have more. It is legitimate for the same map to be a member of two different regularities, if it has a union of their keys. Keys in a regularity must be keywords. Regularities are roughly the same sort of thing as objects in object oriented programming or tables in databases, but the values of the keys are not policed (see `homogeneity`). \ No newline at end of file diff --git a/docs/Parallelism.md b/docs/Parallelism.md index eef803a..3757197 100644 --- a/docs/Parallelism.md +++ b/docs/Parallelism.md @@ -1,6 +1,6 @@ # Parallelism -If this system doesn't make reasonably efficient use of massively parallel processors, it's failed. The sketch hardware for which it's designed is [[Post Scarcity Hardware]]; that system probably won't ever exist but systems somewhat like it almost certainly will, because we're up against the physical limits on the performance of a von Neumann machine, and the only way we can increase performance now is by going increasingly parallel. +If this system doesn't make reasonably efficient use of massively parallel processors, it's failed. The sketch hardware for which it's designed is [Post Scarcity Hardware](Post-scarcity-hardware.html); that system probably won't ever exist but systems somewhat like it almost certainly will, because we're up against the physical limits on the performance of a von Neumann machine, and the only way we can increase performance now is by going increasingly parallel. So on such a system, every function invocation may normally delegate every argument to a different processor, if there is another processor free (which there normally will be). Only special forms, like *cond*, which implement explicit flow control, should serialise evaluation. diff --git a/docs/Paths.md b/docs/Paths.md index 8cb1c6c..2ff3475 100644 --- a/docs/Paths.md +++ b/docs/Paths.md @@ -1,6 +1,6 @@ # Paths -*See also [[How do we notate paths?]], which in part supercedes this.* +*See also [How do we notate paths?](How do we notate paths?.html), which in part supercedes this.* A path is essentially a list of keywords. diff --git a/docs/Post-scarcity-hardware.md b/docs/Post-scarcity-hardware.md index 74bfb3b..cea036e 100644 --- a/docs/Post-scarcity-hardware.md +++ b/docs/Post-scarcity-hardware.md @@ -1,6 +1,8 @@ -_I wrote this essay in 2014; it was previously published on my blog, [here](http://blog.journeyman.cc/2014/10/post-scarcity-hardware.html)_ +# Implementing post scarcity hardware -Eight years ago, I wrote an essay which I called [[Post Scarcity Software]]. It's a good essay; there's a little I'd change about it now - I'd talk more about the benefits of immutability - but on the whole it's the nearest thing to a technical manifesto I have. I've been thinking about it a lot the last few weeks. The axiom on which that essay stands is that modern computers - modern hardware - are tremendously more advanced than modern software systems, and would support much better software systems than we yet seem to have the ambition to create. +_I wrote this essay in 2014; it was previously published on my blog, [here](https://www.journeyman.cc/blog/posts-output/2017-09-19-implementing-postscarcity-hardware/)_ + +Eight years ago, I wrote an essay which I called [Post Scarcity Software](Post-scarcity-software.html). It's a good essay; there's a little I'd change about it now - I'd talk more about the benefits of immutability - but on the whole it's the nearest thing to a technical manifesto I have. I've been thinking about it a lot the last few weeks. The axiom on which that essay stands is that modern computers - modern hardware - are tremendously more advanced than modern software systems, and would support much better software systems than we yet seem to have the ambition to create. That's still true, of course. In fact it's more true now than it was then, because although the pace of hardware change is slowing, the pace of software change is still glacial. So nothing I'm thinking of in terms of post-scarcity computing actually needs new hardware. @@ -22,7 +24,7 @@ It turns out that Clojure's default *map* function simply serialises iterations Except... -Performance doesn't actually improve very much. Consider this function, which is the core function of the [MicroWorld](http://blog.journeyman.cc/2014/08/modelling-settlement-with-cellular.html) engine: +Performance doesn't actually improve very much. Consider this function, which is the core function of the [MicroWorld](https://www.journeyman.cc/blog/posts-output/2014-08-26-modelling-settlement-with-a-cellular-automaton/) engine:
     (defn map-world
diff --git a/docs/Post-scarcity-software.md b/docs/Post-scarcity-software.md
index 8f31771..07fcbf9 100644
--- a/docs/Post-scarcity-software.md
+++ b/docs/Post-scarcity-software.md
@@ -1,4 +1,6 @@
-_This is the text of my essay Post-scarcity Software, originally published in 2006 on my blog [here](http://blog.journeyman.cc/2006/02/post-scarcity-software.html)._
+# Post Scarcity Software
+
+_This is the text of my essay Post-scarcity Software, originally published in 2006 on my blog [here](https://www.journeyman.cc/blog/posts-output/2006-02-20-postscarcity-software/)._
 
 For years we've said that our computers were Turing equivalent, equivalent to Turing's machine U. That they could compute any function which could be computed. They aren't, of course, and they can't, for one very important reason. U had infinite store, and our machines don't. We have always been store-poor. We've been mill-poor, too: our processors have been slow, running at hundreds, then a few thousands, of cycles per second. We haven't been able to afford the cycles to do any sophisticated munging of our data. What we stored - in the most store intensive format we had - was what we got, and what we delivered to our users. It was a compromise, but a compromise forced on us by the inadequacy of our machines.
 
diff --git a/docs/Stack.md b/docs/Stack.md
index 3cdf0c8..04ebe72 100644
--- a/docs/Stack.md
+++ b/docs/Stack.md
@@ -30,7 +30,7 @@ Past Lisps have implemented stack as lists and as vectors. Both work. My own gue
     | prior frame ptr | 552...615       | cons-pointer to preceding stack frame VECP        |
     +-----------------+-----------------+---------------------------------------------------+
 
-Note that every argument to a Lisp function must be a cons space object passed by reference (i.e., a cons pointer). If the actual argument is actually a [[vector space]] object, then what we pass is a reference to the VECP object which references that vector.
+Note that every argument to a Lisp function must be a [cons space object](Cons-space.html) passed by reference (i.e., a cons pointer). If the actual argument is actually a [vector space](Vector-space.html) object, then what we pass is a reference to the VECP object which references that vector.
 
 I'm not certain we need a prior frame pointer; if we don't, we may not need a VECP pointing to a stack frame, since nothing can point to a stack frame other than the next stack frame(s) up the stack (if we parallelise *map*, *and* and so on) which to implement a multi-thread system we essentially must have, there may  be two or more successor frames to any frame. In fact to use a massively multiprocessor machine efficiently we must normally evaluate each parameter in a separate thread, with only special forms such as *cond* which impose explicit control flow evaluating their clauses serially in a single thread.
 
diff --git a/docs/Sysout-and-sysin.md b/docs/Sysout-and-sysin.md
index 9a3ae7a..cabd2e3 100644
--- a/docs/Sysout-and-sysin.md
+++ b/docs/Sysout-and-sysin.md
@@ -6,7 +6,7 @@ This might, actually, turn out not to be terribly hard, but is potentially horre
 
 If we use paged memory, as many UNIX systems do, then memory pages periodically get written to disk and the sum total of the memory pages on disk represent an image of the state of system memory. The problem with this is that the state of system memory is changing all the time, and if some pages are out of date with respect to others you don't have a consistent image.
 
-However, the most volatile area of memory is at the outer end of [[cons space]], since that is where cons cells are most likely to die and consequently where new cons cells are most likely to be allocated. We could conceivably take advantage of this by maintaining a per-page [[free list]], and preferentially allocating from the currently busiest page. Volatility in [[vector space]] is likely to be significantly lower, but significantly more distributed. However, if we stick to the general rule that objects aren't mutable, volatility happens only by allocating new objects or deallocating old ones. So it may be the case that if we make a practice of flushing vector space pages when that page is written to, and flushing the active cons space pages regularly, we may at any time achieve a consistent memory image on disk even if it misses the last few seconds worth of changes in cons space. 
+However, the most volatile area of memory is at the outer end of [cons space](Cons-space.html), since that is where cons cells are most likely to die and consequently where new cons cells are most likely to be allocated. We could conceivably take advantage of this by maintaining a per-page [free list](Free-list.html), and preferentially allocating from the currently busiest page. Volatility in [vector space](Vector-space.html) is likely to be significantly lower, but significantly more distributed. However, if we stick to the general rule that objects aren't mutable, volatility happens only by allocating new objects or deallocating old ones. So it may be the case that if we make a practice of flushing vector space pages when that page is written to, and flushing the active cons space pages regularly, we may at any time achieve a consistent memory image on disk even if it misses the last few seconds worth of changes in cons space. 
 
 Otherwise it's worth looking at whether we could journal changes between page flushes. This may be reasonably inexpensive.
 
diff --git a/docs/System-private-functions.md b/docs/System-private-functions.md
index 304c0bc..c0b3eea 100644
--- a/docs/System-private-functions.md
+++ b/docs/System-private-functions.md
@@ -2,7 +2,7 @@
 
 **actually, I think this is a bad idea — or at least needs significantly more thought!**
 
-System-private functions are functions private to the system, which no normal user is entitled to access; these functions normally have an [[access control]] value of NIL.
+System-private functions are functions private to the system, which no normal user is entitled to access; these functions normally have an [access control](Access-control.html) value of NIL.
 
 # (sys-access-control arg)
 
diff --git a/docs/Topology-of-the-hardware-of-the-deep-future.md b/docs/Topology-of-the-hardware-of-the-deep-future.md
index 811a8d2..c7af777 100644
--- a/docs/Topology-of-the-hardware-of-the-deep-future.md
+++ b/docs/Topology-of-the-hardware-of-the-deep-future.md
@@ -1,6 +1,6 @@
 ![HAL 9000 - a vision of the hardware of the deep future](https://vignette4.wikia.nocookie.net/2001/images/5/59/Hal_console.jpg/revision/latest?cb=20090823025755)In thinking about how to write a software architecture that won't quickly become obsolescent, I find that I'm thinking increasingly about the hardware on which it will run.
 
-In [[Post Scarcity Hardware]] I envisaged a single privileged node which managed main memory. Since then I've come to thing that this is a brittle design which will lead to bottle necks, and that each cons page will be managed by a separate node. So there needs to be a hardware architecture which provides the shortest possible paths between nodes.
+In [Post Scarcity Hardware](Post-scarcity-hardware.html) I envisaged a single privileged node which managed main memory. Since then I've come to thing that this is a brittle design which will lead to bottle necks, and that each cons page will be managed by a separate node. So there needs to be a hardware architecture which provides the shortest possible paths between nodes.
 
 Well, actually... from a software point of view it doesn't matter. From a software point of view, provided it's possible for any node to request a memory item from any other node, that's enough, and, for the software to run (slowly), a linear serial bus would do. But part of the point of this thinking is to design hardware which is orders of magnitude faster than the [von Neumann architecture](https://en.wikipedia.org/wiki/Von_Neumann_architecture) allows. So for performance, cutting the number of hops to a minimum is important.
 
diff --git a/docs/Users.md b/docs/Users.md
index e7ca536..a6bd5ad 100644
--- a/docs/Users.md
+++ b/docs/Users.md
@@ -1,9 +1,9 @@
 # Users
 
-I'm not yet sure what sort of objects users are. They may just be lists, interned in a special namespace such as *system.users*. They may be special purpose [[vector space]] objects (although I don't see why, apart from to get a special tag, which might be useful).
+I'm not yet sure what sort of objects users are. They may just be lists, interned in a special namespace such as *system.users*. They may be special purpose [vector space](Vector-space.html) objects (although I don't see why, apart from to get a special tag, which might be useful).
 
-Every user object must contain credentials, and the credentials must be readable by system only; the credentials are either a hashed password or a cryptographic public key. The user object must also have an identifying name, and probably other identifying information. But it's not necessarily the case that every user on the system needs to be able to see the names of every other user on the system, so the identifying information (or the user object itself) may have [[access control]] lists.
+Every user object must contain credentials, and the credentials must be readable by system only; the credentials are either a hashed password or a cryptographic public key. The user object must also have an identifying name, and probably other identifying information. But it's not necessarily the case that every user on the system needs to be able to see the names of every other user on the system, so the identifying information (or the user object itself) may have [access control](Access-control.html) lists.
 
-There is a problem here with the principle of [[immutability]]; if an access control list on an object _foo_ contains a pointer to my user object so that I can read _foo_, and I change my password, then the immutability rule says that a new copy of the *system.users* namespace is created with a new copy of my user object. This new user object isn't on any access control list so by changing my password I actually can't read anything.
+There is a problem here with the principle of [immutability](Immutability.html); if an access control list on an object _foo_ contains a pointer to my user object so that I can read _foo_, and I change my password, then the immutability rule says that a new copy of the *system.users* namespace is created with a new copy of my user object. This new user object isn't on any access control list so by changing my password I actually can't read anything.
 
 This means that what we put on access control lists is not user objects, but symbols (usernames) which are bound in *system.users* to user objects; the user object then needs a back-pointer to that username. A user group then becomes a list not of user objects but of interned user names.
\ No newline at end of file
diff --git a/docs/Vector-space.md b/docs/Vector-space.md
index 7456674..528b04c 100644
--- a/docs/Vector-space.md
+++ b/docs/Vector-space.md
@@ -2,7 +2,7 @@
 
 Vector space is what in conventional computer languages is known as 'the heap'. Because objects allocated in vector space are of variable size, vector space will fragment over time. Objects in vector space will become unreferenced, making them available for garbage collection and reallocation; but ultimately you will arrive at the situation where there are a number of small free spaces in vector space but you need a large one. Therefore there must ultimately be a mark-and-sweep garbage collector for vector space.
 
-To facilitate this, reference to every vector space object will be indirected through exactly one VECP object in [[cons space]]. If a live vector space object has to be moved in memory in order to compact the heap and to allocate a new object, only one pointer need be updated. This saves enormously on mark-and-sweep time, at the expense of a small overhead on access to vector space objects.
+To facilitate this, reference to every vector space object will be indirected through exactly one VECP object in [cons space](Cons-space.html). If a live vector space object has to be moved in memory in order to compact the heap and to allocate a new object, only one pointer need be updated. This saves enormously on mark-and-sweep time, at the expense of a small overhead on access to vector space objects.
 
 Every vector space object must have a header, indicating that it is a vector space object and what sort of a vector space object it is. Each vector space object must have a fixed size, which is declared in its header. Beyond the header, the payload of a vector space object is undetermined.
 
@@ -22,11 +22,11 @@ So the header looks like this
     
 ### Tag
 
-The tag will be a 32 bit unsigned integer in the same way and for the same reasons that it is in [[cons space]]: i.e., because it will be alternately readable as a four character ASCII string, which will aid memory debugging.
+The tag will be a 32 bit unsigned integer in the same way and for the same reasons that it is in [cons space](Cons-space.html): i.e., because it will be alternately readable as a four character ASCII string, which will aid memory debugging.
 
 ### Vecp-pointer
 
-The vecp pointer is a back pointer to the VECP object in cons space which points to this vector space object. It is, therefore, obviously, the size of a [[cons pointer]], which is to say 64 bits.
+The vecp pointer is a back pointer to the VECP object in cons space which points to this vector space object. It is, therefore, obviously, the size of a [cons pointer](consspaceobject_8h.html#structcons__pointer), which is to say 64 bits.
 
 ### Size
 
@@ -61,7 +61,7 @@ We definitely need chunks of executable code - compiled functions.
 
 ### HASH
 
-We definitely need hashtables. A hashtable is implemented as a pointer to a hashing function, and an array of N cons-pointers each of which points to an [[assoc list]] acting as a hash bucket. A hashtable is immutable. Any function which 'adds a new key/value pair to' a hashtable in fact returns a new hashtable containing all the key value bindings from the old one, with the new one added. Any function which 'changes a key/value pair' in a hashtable in fact returns a new value with the same bindings of all the keys except the one which has changed as the old one.
+We definitely need hashtables. A hashtable is implemented as a pointer to a hashing function, and an array of N cons-pointers each of which points to an [assoc list](Hybrid-assoc-lists.html) acting as a hash bucket. A hashtable is immutable. Any function which 'adds a new key/value pair to' a hashtable in fact returns a new hashtable containing all the key value bindings from the old one, with the new one added. Any function which 'changes a key/value pair' in a hashtable in fact returns a new value with the same bindings of all the keys except the one which has changed as the old one.
 
 In either case, anything which held a pointer to the old version still sees the old version, which continues to exist until everything which pointed to it has been deallocated. Only things which access the hashtable via a binding in a current namespace will see the new version.