Started to look at whether MongoDB would make a useful knowledge store.

2020-04-24 11:31:18 +01:00 · 2020-04-24 11:31:18 +01:00 · 2d395251b5
parent 72f486bc27
commit 2d395251b5
34 changed files with 152 additions and 55 deletions
--- a/doc/Analysis.md
+++ b/doc/Analysis.md
@ -775,7 +775,7 @@ believe'); and, implicit in the qualifier, the possibility of a rebuttal:
 ![Argument schama after Toulmin, p 104](../img/toulmin-argument-schema.svg)

 In conversation, Toulmin argues, it may be natural simply to say
-'&lt;data&gt; so &lt;c|aim&gt;' ; to say '&lt;c|aim&gt; because &lt;warrant&gt; because
+'&lt;data&gt; so &lt;claim&gt;' ; to say '&lt;claim&gt; because &lt;warrant&gt; because
 &lt;data&gt;' "...strikes us as cumbrous and artificial, for it puts in an extra step which is trivial
 and unnecessary".

--- a/doc/Arboretum.md
+++ b/doc/Arboretum.md
@ -102,7 +102,7 @@ is true":

 ![Simplest possible DTree](../img/simplest-possible-dtree.svg)

-fig 3: simplest possible rule Conjunctions are represented by columns of
+fig 1: simplest possible rule Conjunctions are represented by columns of
 nodes, only the last of which has the colour to be returned if all are
 true and disjunctions by branches, each of which terminates in the
 colour to be returned if any are true. These can be combined in any
@ -111,7 +111,7 @@ individual rule structures small. This is shown in the figure below:

 ![Example DTree](../img/example-dtree.svg)

-fig 4: example rule, showing syntax The rule would read: "(rootnode) is
+fig 2: example rule, showing syntax The rule would read: "(rootnode) is
 false unless (first conjunct) is true and (second conjunct) is true, in
 which case it is true unless either (first disjunct) or (second
 disjunct) is true".
@ -262,11 +262,11 @@ our knowledge base contains the following rules:

 ![DTree for 'Entitled to Widows' Allowance](../img/dtree-widows-allowance.svg)

-fig 1: Rule for "Entitled to Widow's Allowance"
+fig 3: Rule for "Entitled to Widow's Allowance"

 ![DTree for Living with Partner](../img/dtree-live-with-partner.svg)

-fig 2: rule for "Living with Partner"
+fig 4: rule for "Living with Partner"

 which, together, partially encode
 the following legislation fragment, from the Social Security Act 1975
--- a/doc/Bialowieza.md
+++ b/doc/Bialowieza.md
@ -48,7 +48,7 @@ So we shall say that a proposition will be represented as a Clojure map with at

 Thus

-    {:verb :killed :subject :brutus :object :caesar}
+    {:verb :kill :subject :brutus :object :caesar}

 is a proposition which asserts that Brutus killed Caesar.

@ -61,36 +61,72 @@ There may be many other privileged keys, such as
 * `:data` - an argument structure...!
 * `:authority` - id of agent from whom, or rule from which, I know this;

-and so on. The exact set of privileged keys is probably actually a matter for particular advocates rather than for the engine itself, although if the advocates in the game don't broadly share the same set of privileged keys then it won't work very well.
+and so on. The exact set of privileged keys is probably actually a matter for
+particular advocates rather than for the engine itself, although if the advocates
+in the game don't broadly share the same set of privileged keys then it won't
+work very well.

 *However...*

-The attentive reader will note that some of the proposed privileged keys map closely onto the [Toulmin schema](Analysis.html#the-toulmin-schema). Thus we can say:
+The attentive reader will note that some of the proposed privileged keys map
+closely onto the [Toulmin schema](Analysis.html#the-toulmin-schema). Thus we can say:

 * that the proposition itself is a `claim` in the sense of the **C** term;
 * that `:data` above is precisely `data` in the sense of the **D** term in Toulmin's schema, but may (is likely to) also provide a `warrant` in the sense of the **W** term;
 * that `:truth` and `:confidence` are both `qualifiers` of the claim in the sense of the **Q** term;
 * that `:authority` is a form of `backing` in the sense of the **B** term.

-So what, then, is an 'argument structure', as described above? It seems to me that it may be exactly a proposition, with the special feature that the value of the `:data` key is not minimised.
+So what, then, is an 'argument structure', as described above? It seems to me
+that it may be exactly a proposition, with the special feature that the value
+of the `:data` key is not minimised.
+
+Recall that in the chapter on Arboretum I observed that [the working of the DTree decision algorithm caused precisely those nodes to be collected whose fragments which provided the most relevant explanation](Arboretum.html#relevance-filtering) to support the decision, in a natural sequence from the general to the particular. I believe that precisely the same fortuitous alchemy will provide the argument structure to provide Toulmin's **D** - out `:data` term. The DTree itself then becomes the **W** - the `:warrant`; and the author of the DTree becomes the `:authority`.

 #### Proposition minimisation

-How are the values of `:subject`, `:object` and so on to be passed? If we pass rich knowledge structures around, then we lose the insight that different advocates may know different things about given objects. Thus, while internally within each advocate's knowledge base objects may be stored with rich data, when they're passed around in propositions they should be minimised - that is to say, the value should just be a unique identifier, such that, for every object in the domain, if an advocate knows anything at all about that object, it knows its unique identifier and knows the object by that unique identifier.
+How are the values of `:subject`, `:object` and so on to be passed? If we pass
+rich knowledge structures around, then we lose the insight that different
+advocates may know different things about given objects. Thus, while internally
+within each advocate's knowledge base objects may be stored with rich data, when
+they're passed around in propositions they should be minimised - that is to say,
+the value should just be a unique identifier, such that, for every object in the
+domain, if an advocate knows anything at all about that object, it knows its
+unique identifier and knows the object by that unique identifier.

-Thus the unique identifier has something of the nature of a 'true name', in the magical sense. A given true name, a given unique identifier, refers to precisely one thing in the world, and provided that two advocates both know the same true name, they can debats propositions which refer to the object with that true name.
+Thus the unique identifier has something of the nature of a 'true name', in the
+magical sense. A given true name, a given unique identifier, refers to precisely
+one thing in the world, and provided that two advocates both know the same true
+name, they can debats propositions which refer to the object with that true name.

-Generally, a true name shall be a Clojure keyword. That keyword, passed to any advocate in the game, shall identify either `nil` (the advocate knows nothing of the object), or a map representing everything the advocate knows about the object, and within that map, the value of the key `:id` shall be that true name.
+Generally, a true name shall be a Clojure keyword. That keyword, passed to any
+advocate in the game, shall identify either `nil` (the advocate knows nothing
+of the object), or a map representing everything the advocate knows about the
+object, and within that map, the value of the key `:id` shall be that true name.

-But in saying 'the advocate knows', actually, the advocate knows nothing. The advocate has access to a knowledge base, and it is in the knowledge base that the knowledge is stored. It may be an individual knowledge base, in which case we can implement that idea that different advocates may have the different knowledge about the same object, or it may be a shared consensual knowledge base.
+But in saying 'the advocate knows', actually, the advocate knows nothing. The
+advocate has access to a knowledge base, and it is in the knowledge base that
+the knowledge is stored. It may be an individual knowledge base, in which case
+we can implement that idea that different advocates may have the different
+knowledge about the same object, or it may be a shared consensual knowledge
+base.

-A proposition is represented as a map. So to minimise a proposition, for every value in that map, if the value is itself a map it shall be replaced by the value of the key `:id` in that map.
+A proposition is represented as a map. So to minimise a proposition, for every
+value in that map, if the value is itself a map it shall be replaced by the
+value of the key `:id` in that map.

-This means that every implementation of the `wildwood.knowledge-accessor/Accessor` protocol must transduce whatever token its backing store uses as the primary key for an object to `:id` when it performs a `fetch` operation.
+This means that every implementation of the `wildwood.knowledge-accessor/Accessor`
+protocol must transduce whatever token its backing store uses as the primary key
+for an object to `:id` when it performs a `fetch` operation.

 ## Thoughts on the shape of a knowledge base

-The object of building Bialowieza as a library is that we should not constrain how applications which use the library store their knowledge. Rather, knowledge accessors must transduce between the representation used by the particular storage implementation and that defined in `wildwood.schema`. However, what we've described above suggests that a hierarchical database would be a very natural fit for knowlege base data - more natural, in this case, than a relational database.
+The object of building Bialowieza as a library is that we should not constrain
+how applications which use the library store their knowledge. Rather, knowledge
+accessors must transduce between the representation used by the particular
+storage implementation and that defined in `wildwood.schema`. However, what
+we've described above suggests that a hierarchical database would be a very
+natural fit for knowlege base data - more natural, in this case, than a
+relational database.

 ## Prejudice, and defaults

--- a/docs/codox/AgainstTruth.html
+++ b/docs/codox/AgainstTruth.html
--- a/docs/codox/Analysis.html
+++ b/docs/codox/Analysis.html
--- a/docs/codox/Arboretum.html
+++ b/docs/codox/Arboretum.html
--- a/docs/codox/Arden.html
+++ b/docs/codox/Arden.html
--- a/docs/codox/BatesonKammerer.html
+++ b/docs/codox/BatesonKammerer.html
--- a/docs/codox/Bialowieza.html
+++ b/docs/codox/Bialowieza.html
--- a/docs/codox/Errata.html
+++ b/docs/codox/Errata.html
--- a/docs/codox/Experience.html
+++ b/docs/codox/Experience.html
--- a/docs/codox/HegemonicArgument.html
+++ b/docs/codox/HegemonicArgument.html
--- a/docs/codox/History.html
+++ b/docs/codox/History.html
--- a/docs/codox/HuxleyKropotkin.html
+++ b/docs/codox/HuxleyKropotkin.html
--- a/docs/codox/Implementing.html
+++ b/docs/codox/Implementing.html
--- a/docs/codox/JAccuse.html
+++ b/docs/codox/JAccuse.html
--- a/docs/codox/KnacqTools.html
+++ b/docs/codox/KnacqTools.html
--- a/docs/codox/Manifesto.html
+++ b/docs/codox/Manifesto.html
--- a/docs/codox/OnHylasAndPhilonus.html
+++ b/docs/codox/OnHylasAndPhilonus.html
--- a/docs/codox/PredicateSubtext.html
+++ b/docs/codox/PredicateSubtext.html
--- a/docs/codox/TheProblem.html
+++ b/docs/codox/TheProblem.html
--- a/docs/codox/index.html
+++ b/docs/codox/index.html
--- a/docs/codox/intro.html
+++ b/docs/codox/intro.html
--- a/docs/codox/wildwood.advocate.html
+++ b/docs/codox/wildwood.advocate.html
--- a/docs/codox/wildwood.bialowieza.html
+++ b/docs/codox/wildwood.bialowieza.html
--- a/docs/codox/wildwood.caesar.html
+++ b/docs/codox/wildwood.caesar.html
--- a/docs/codox/wildwood.dengine.engine.html
+++ b/docs/codox/wildwood.dengine.engine.html
--- a/docs/codox/wildwood.dengine.node.html
+++ b/docs/codox/wildwood.dengine.node.html
--- a/docs/codox/wildwood.knowledge-accessor.html
+++ b/docs/codox/wildwood.knowledge-accessor.html
--- a/docs/codox/wildwood.mongo-ka.html
+++ b/docs/codox/wildwood.mongo-ka.html
--- a/docs/codox/wildwood.schema.html
+++ b/docs/codox/wildwood.schema.html
--- a/project.clj
+++ b/project.clj
@ -5,7 +5,9 @@
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [org.clojure/math.numeric-tower "0.0.4"]
-                 [com.taoensso/timbre "4.10.0"]]
+                 [com.taoensso/timbre "4.10.0"]
+                 [com.novemberain/monger "3.1.0"]
+                 [prismatic/schema "1.1.12"]]
  :codox {:metadata {:doc "**TODO**: write docs"
                     :doc/format :markdown}
          :output-path "docs/codox"
--- a/src/wildwood/mongo_ka.clj
+++ b/src/wildwood/mongo_ka.clj
@ -0,0 +1,44 @@
+(ns wildwood.mongo-ka
+  "A knowledge accessor fetching from and storing to Mongo DB.
+
+  Hierarchical databases seem a very natural fit for how we're storing
+  knowledge. Mongo DB seems a particularly natural fit since its
+  internal representation is JSON, which can be transformed to EDN
+  extremely naturally."
+  (:require [monger.core :as mg]
+            [monger.collection :as mc]
+            [wildwood.knowledge-accessor :refer [Accessor]])
+  (:import [com.mongodb MongoOptions ServerAddress]
+           [com.mongodb DB WriteConcern]
+           [org.bson.types ObjectId]))
+
+;; MongoDB data items are identified by ObjectId objects. In the retrieved
+;; record from MongoDB, key value is the value of a keyword `:_id` I don't
+;; think there's any *in principle* reason why we should not use these objects
+;; as key values - they're presumably designed to be globally unique.
+;;
+;; In which case, on the way down we have to set `:_id` to the value of `:id`
+;; and vice versa on the way back up.
+
+(defrecord MongoKA
+  ;; It's not clear to me whether we need to pass both the connection and the
+  ;; database in - it's possible that the connected database handle is
+  ;; sufficient. The value of `:collection` is the name of the collection
+  ;; within the database to which this accessor writes.
+  [connection db ^String collection]
+  Accessor
+  (fetch
+    [_ id]
+    (let [oid (cond
+                (instance? ObjectId id) id
+                (string? id) (ObjectId. id)
+                (keyword? id) (ObjectId. (name id)))
+          record (mc/find-by-id db collection oid)]
+      (when record
+        (assoc
+          (dissoc record :_id)
+          :id id))))
+  (store [_ id proposition]
+         ;; don't really know how to do this and am too tired just now.
+         ))
+
--- a/src/wildwood/schema.clj
+++ b/src/wildwood/schema.clj
@ -29,6 +29,11 @@
    :authority  ;; id of agent from whom, or rule from which, I know this.
    })

+(def preserved-keys
+  "Keys whose values should not be minimised during proposition minimisation"
+  ;; TODO: actually, this may end up being just :data
+  (set (cons :data argument-keys)))
+
 (defn proposition?
  "True if `o` qualifies as a proposition. A proposition is probably a map
  with some privileged keys, and may look something like a minimised
@ -92,6 +97,8 @@
    (number? (:confidence o))
    (<= -1 (:confidence o) 1)))

+(set (cons :data argument-keys))
+
 (defn minimise
  "Expecting that `o` is a (potentially rich) proposition, return a map identical
  to `o` save that for each value `v` of key `k` in `o`, if `v` is a map and `k`
@ -110,7 +117,7 @@
          {k
           (let [v (k o)]
             (if
-               (and (not (argument-keys k)) (map? v))
+               (and (not (preserved-keys k)) (map? v))
               (:id v)
               v))})
        (keys o)))