Updated Post Scarcity Hardware (markdown)

Simon Brooke 2017-01-02 23:34:33 +00:00
parent b763e0417e
commit b3cb880564

@ -1,6 +1,6 @@
_I wrote this essay in 2014; it was previously published on my blog, [here](http://blog.journeyman.cc/2014/10/post-scarcity-hardware.html)_
Eight years ago, I wrote an essay which I called Post Scarcity Software. It's a good essay; there's a little I'd change about it now - I'd talk more about the benefits of immutability - but on the whole it's the nearest thing to a technical manifesto I have. I've been thinking about it a lot the last few weeks. The axiom on which that essay stands is that modern computers - modern hardware - are tremendously more advanced than modern software systems, and would support much better software systems than we yet seem to have the ambition to create.
Eight years ago, I wrote an essay which I called [[Post Scarcity Software]]. It's a good essay; there's a little I'd change about it now - I'd talk more about the benefits of immutability - but on the whole it's the nearest thing to a technical manifesto I have. I've been thinking about it a lot the last few weeks. The axiom on which that essay stands is that modern computers - modern hardware - are tremendously more advanced than modern software systems, and would support much better software systems than we yet seem to have the ambition to create.
That's still true, of course. In fact it's more true now than it was then, because although the pace of hardware change is slowing, the pace of software change is still glacial. So nothing I'm thinking of in terms of post-scarcity computing actually needs new hardware.
@ -18,7 +18,7 @@ Mapping, in a language with immutable data, in inherently parallelisable. There
What?
It turns out that Clojure's default map function simply serialises iterations in a single process. Why? Well, one finds out when one investigates a bit. Clojure provides two different versions of parallel mapping functions, pmap and clojure.core.reducers/map. So what happens when you swap map for pmap? Why, performance improves, and all your available cores get used!
It turns out that Clojure's default *map* function simply serialises iterations in a single process. Why? Well, one finds out when one investigates a bit. Clojure provides two different versions of parallel mapping functions, *pmap* and *clojure.core.reducers/map*. So what happens when you swap *map* for *pmap*? Why, performance improves, and all your available cores get used!
Except...
@ -66,7 +66,7 @@ Runs at about 690% processor loading - almost fully using seven cores. But, as y
"Elapsed time: 36762.382725 msecs"
#'mw-explore.optimise/x2
(For completeness, the clojure.core.reducers/map is even slower, so is not discussed in any further detail)
(For completeness, the *clojure.core.reducers/map* is even slower, so is not discussed in any further detail)
## Non parallel version
@ -78,7 +78,7 @@ Maxes out one single core, takes about 3.6 times as long as the hybrid version.
"Elapsed time: 88412.883849 msecs"
#'mw-explore.optimise/x2
Now, I need to say a little more about this. It's obvious that there's a considerable set-up/tear-down cost for threads. The reason I'm using pmap for the outer mapping but serial map for the inner mapping rather than the other way round is to do more work in each thread.
Now, I need to say a little more about this. It's obvious that there's a considerable set-up/tear-down cost for threads. The reason I'm using *pmap* for the outer mapping but serial *map* for the inner mapping rather than the other way round is to do more work in each thread.
However, I'm still simple-mindedly parallelising the whole of one map operation and serialising the whole of the other. This particular array is 2048 cells square - so over four million cells in total. But, by parallelising the outer map operation, I'm actually asking the operating system for 2048 threads - far more than there are cores. I have tried to write a version of map using Runtime.getRuntime().availableProcessors() to find the number of processors I have available, and then partitioned the outer array into that number of partitions and ran the parallel map function over that partitioning: