Calculating the size of a fully populated machine on my current software specification.

Simon Brooke 2021-08-09 13:00:30 +01:00
parent 4a6fb5273f
commit f90c11a598

@ -1,10 +1,13 @@
The address space hinted at by using 64 bit cons-space and a 64 bit vector space containing objects each of whose length may be up to 1.4e20 bytes (2^64 of 64 bit words) is so large that a completely populated post-scarcity hardware machine can probably never be built. But that doesn't mean I'm wrong to specify such an address space: if we can make this architecture work for machines that can't (yet, anyway) be built, it will work for machines that can; and, changing the size of the pointers, which one might wish to do for storage economy, can be done with a few edits to consspaceobject.h.
But, for the moment, let's discuss a potential 32 bit psh machine, and how it mght be built.
But, for the moment, let's discuss a potential 32 bit psh machine, and how it might be built.
## Pass one: a literal implementation
Let's say a processing node comprises a two core 32 bit processor, such as an ARM, 4GB of RAM, and a custom router chip. On each node, core zero is theactual processing node, and core one handles communications. We arrange these on a printed circuit board that is 4 nodes by 4 nodes. Each node is connected to the nodes in front, behind, left and right by tracks on the board, and by pins to the nodes on the boards above and below. On the edges of the board, the tracks which have no 'next neighbour' lead to some sort of reasonably high speed bidirectional serial connection - I'm imagining optical fibre (or possibly pairs of optical fibre, one for each direction). These boards are assembled in stacks of four, and the 'up' pins on the top board and the 'down' pins (or sockets) on the bottom board connect to similar high speed serial connectors.
Let's say a processing node comprises a two core 32 bit processor, such as an ARM, 4GB of RAM, and a custom router chip.
On each node, core zero is the actual processing node, and core one handles communications. We arrange these on a printed circuit board that is 4 nodes by 4 nodes. Each node is connected to the nodes in front, behind, left and right by tracks on the board, and by pins to the nodes on the boards above and below. On the edges of the board, the tracks which have no 'next neighbour' lead to some sort of reasonably high speed bidirectional serial connection - I'm imagining optical fibre (or possibly pairs of optical fibre, one for each direction). These boards are assembled in stacks of four, and the 'up' pins on the top board and the 'down' pins (or sockets) on the bottom board connect to similar high speed serial connectors.
This unit of 4 boards - 64 compute nodes - now forms both a logical and a physical cube. Let's call this cube module a crystal. Connect left to right, top to bottom and back to front, and you have a hypercube. But take another identical crystal, place it along side, connect the right of crystal A to the left of crystal B and the right of B to the left of A, leaving the tops and bottoms and lefts and rights of those crystals still connected to themselves, and you have a larger cuboid with more compute power and address space but slightly lower path efficiency. Continue in this manner until you have four layers of four crystals, and you have a compute unit of 4096 nodes. So the basic 4x4x4 building block - the 'crystal' - is a good place to start, and it is in some measure affordable to build - low numbers of thousands of pounds, even for a prototype.
@ -49,3 +52,14 @@ But note that there's one huge advantage that this single-chip virtual crystal h
There are downsides to this, too. While communication inside the crystal is easier and quicker, communication between crystals becomes a lot more complex and I don't yet even have an idea how it might work. Also, contention on the main address bus, with 64 processors all trying to write to and read from the same memory at the same time, is likely to be horrendous, leading to much lower speed than the solution where each node has its own memory.
On a cost side, you probably fit this all onto one printed circuit board as against the 4 of the 'literal' design; the single processor chip is likely to cost around £400; and the memory will probably be a little cheaper than on the literal design; and you don't need the custom routers, or the connection hardware, or the optical transcievers. So the cost probably looks more like £5,000.
## Size of a fully populated machine
### Memory size
To fully implement the software specification as currently written, each node would need 128Gb of RAM for its curated cons space alone (since we can have 2<sup>32</sup> cons cells each of 32 bytes); an amount of memory for vector space; substantial cache of objects being processed by the node but curated by other nodes; and scratchpad space. How much memory for vector space? The current software specification allows for vectors up to 32 times the total address space of currently available 64 bit processors. But not only could such objects not easily be stored with current generation technology, they could also not be copied across the hypercube lattice in any useful sort of time. So functions which operate on large vector space objects would necessarily have to migrate to the node where the object is curated, rather than have the object migrate. However, obviously it is unaffordable to build a machine which can explore problems like that as a first prototype, so this is at present academic.
### Lattice size
If we hold to the doctrine of one cons page per node, which has the advantage of making addressing reasonably simple, then there can be up to 2<sup>32</sup>, or 4,294,967,296 nodes, forming a hypercube of 1625 x 1625 x 1625 nodes. The total address space of this machine would be of the order of 79,228,162,514,264,337,593,543,950,336 bytes, or 7.9x10<sup>28</sup>. This is about 7 brontobytes - far beyond the zetabytes of my original sketch.