Started to look at whether MongoDB would make a useful knowledge store.

This commit is contained in:
Simon Brooke 2020-04-24 11:31:18 +01:00
parent 72f486bc27
commit 2d395251b5
No known key found for this signature in database
GPG key ID: A7A4F18D1D4DF987
34 changed files with 152 additions and 55 deletions

View file

@ -775,7 +775,7 @@ believe'); and, implicit in the qualifier, the possibility of a rebuttal:
![Argument schama after Toulmin, p 104](../img/toulmin-argument-schema.svg) ![Argument schama after Toulmin, p 104](../img/toulmin-argument-schema.svg)
In conversation, Toulmin argues, it may be natural simply to say In conversation, Toulmin argues, it may be natural simply to say
'<data> so <c|aim>' ; to say '<c|aim> because <warrant> because '<data> so <claim>' ; to say '<claim> because <warrant> because
<data>' "...strikes us as cumbrous and artificial, for it puts in an extra step which is trivial <data>' "...strikes us as cumbrous and artificial, for it puts in an extra step which is trivial
and unnecessary". and unnecessary".

View file

@ -102,7 +102,7 @@ is true":
![Simplest possible DTree](../img/simplest-possible-dtree.svg) ![Simplest possible DTree](../img/simplest-possible-dtree.svg)
fig 3: simplest possible rule Conjunctions are represented by columns of fig 1: simplest possible rule Conjunctions are represented by columns of
nodes, only the last of which has the colour to be returned if all are nodes, only the last of which has the colour to be returned if all are
true and disjunctions by branches, each of which terminates in the true and disjunctions by branches, each of which terminates in the
colour to be returned if any are true. These can be combined in any colour to be returned if any are true. These can be combined in any
@ -111,7 +111,7 @@ individual rule structures small. This is shown in the figure below:
![Example DTree](../img/example-dtree.svg) ![Example DTree](../img/example-dtree.svg)
fig 4: example rule, showing syntax The rule would read: "(rootnode) is fig 2: example rule, showing syntax The rule would read: "(rootnode) is
false unless (first conjunct) is true and (second conjunct) is true, in false unless (first conjunct) is true and (second conjunct) is true, in
which case it is true unless either (first disjunct) or (second which case it is true unless either (first disjunct) or (second
disjunct) is true". disjunct) is true".
@ -262,11 +262,11 @@ our knowledge base contains the following rules:
![DTree for 'Entitled to Widows' Allowance](../img/dtree-widows-allowance.svg) ![DTree for 'Entitled to Widows' Allowance](../img/dtree-widows-allowance.svg)
fig 1: Rule for "Entitled to Widow's Allowance" fig 3: Rule for "Entitled to Widow's Allowance"
![DTree for Living with Partner](../img/dtree-live-with-partner.svg) ![DTree for Living with Partner](../img/dtree-live-with-partner.svg)
fig 2: rule for "Living with Partner" fig 4: rule for "Living with Partner"
which, together, partially encode which, together, partially encode
the following legislation fragment, from the Social Security Act 1975 the following legislation fragment, from the Social Security Act 1975

View file

@ -48,7 +48,7 @@ So we shall say that a proposition will be represented as a Clojure map with at
Thus Thus
{:verb :killed :subject :brutus :object :caesar} {:verb :kill :subject :brutus :object :caesar}
is a proposition which asserts that Brutus killed Caesar. is a proposition which asserts that Brutus killed Caesar.
@ -61,36 +61,72 @@ There may be many other privileged keys, such as
* `:data` - an argument structure...! * `:data` - an argument structure...!
* `:authority` - id of agent from whom, or rule from which, I know this; * `:authority` - id of agent from whom, or rule from which, I know this;
and so on. The exact set of privileged keys is probably actually a matter for particular advocates rather than for the engine itself, although if the advocates in the game don't broadly share the same set of privileged keys then it won't work very well. and so on. The exact set of privileged keys is probably actually a matter for
particular advocates rather than for the engine itself, although if the advocates
in the game don't broadly share the same set of privileged keys then it won't
work very well.
*However...* *However...*
The attentive reader will note that some of the proposed privileged keys map closely onto the [Toulmin schema](Analysis.html#the-toulmin-schema). Thus we can say: The attentive reader will note that some of the proposed privileged keys map
closely onto the [Toulmin schema](Analysis.html#the-toulmin-schema). Thus we can say:
* that the proposition itself is a `claim` in the sense of the **C** term; * that the proposition itself is a `claim` in the sense of the **C** term;
* that `:data` above is precisely `data` in the sense of the **D** term in Toulmin's schema, but may (is likely to) also provide a `warrant` in the sense of the **W** term; * that `:data` above is precisely `data` in the sense of the **D** term in Toulmin's schema, but may (is likely to) also provide a `warrant` in the sense of the **W** term;
* that `:truth` and `:confidence` are both `qualifiers` of the claim in the sense of the **Q** term; * that `:truth` and `:confidence` are both `qualifiers` of the claim in the sense of the **Q** term;
* that `:authority` is a form of `backing` in the sense of the **B** term. * that `:authority` is a form of `backing` in the sense of the **B** term.
So what, then, is an 'argument structure', as described above? It seems to me that it may be exactly a proposition, with the special feature that the value of the `:data` key is not minimised. So what, then, is an 'argument structure', as described above? It seems to me
that it may be exactly a proposition, with the special feature that the value
of the `:data` key is not minimised.
Recall that in the chapter on Arboretum I observed that [the working of the DTree decision algorithm caused precisely those nodes to be collected whose fragments which provided the most relevant explanation](Arboretum.html#relevance-filtering) to support the decision, in a natural sequence from the general to the particular. I believe that precisely the same fortuitous alchemy will provide the argument structure to provide Toulmin's **D** - out `:data` term. The DTree itself then becomes the **W** - the `:warrant`; and the author of the DTree becomes the `:authority`.
#### Proposition minimisation #### Proposition minimisation
How are the values of `:subject`, `:object` and so on to be passed? If we pass rich knowledge structures around, then we lose the insight that different advocates may know different things about given objects. Thus, while internally within each advocate's knowledge base objects may be stored with rich data, when they're passed around in propositions they should be minimised - that is to say, the value should just be a unique identifier, such that, for every object in the domain, if an advocate knows anything at all about that object, it knows its unique identifier and knows the object by that unique identifier. How are the values of `:subject`, `:object` and so on to be passed? If we pass
rich knowledge structures around, then we lose the insight that different
advocates may know different things about given objects. Thus, while internally
within each advocate's knowledge base objects may be stored with rich data, when
they're passed around in propositions they should be minimised - that is to say,
the value should just be a unique identifier, such that, for every object in the
domain, if an advocate knows anything at all about that object, it knows its
unique identifier and knows the object by that unique identifier.
Thus the unique identifier has something of the nature of a 'true name', in the magical sense. A given true name, a given unique identifier, refers to precisely one thing in the world, and provided that two advocates both know the same true name, they can debats propositions which refer to the object with that true name. Thus the unique identifier has something of the nature of a 'true name', in the
magical sense. A given true name, a given unique identifier, refers to precisely
one thing in the world, and provided that two advocates both know the same true
name, they can debats propositions which refer to the object with that true name.
Generally, a true name shall be a Clojure keyword. That keyword, passed to any advocate in the game, shall identify either `nil` (the advocate knows nothing of the object), or a map representing everything the advocate knows about the object, and within that map, the value of the key `:id` shall be that true name. Generally, a true name shall be a Clojure keyword. That keyword, passed to any
advocate in the game, shall identify either `nil` (the advocate knows nothing
of the object), or a map representing everything the advocate knows about the
object, and within that map, the value of the key `:id` shall be that true name.
But in saying 'the advocate knows', actually, the advocate knows nothing. The advocate has access to a knowledge base, and it is in the knowledge base that the knowledge is stored. It may be an individual knowledge base, in which case we can implement that idea that different advocates may have the different knowledge about the same object, or it may be a shared consensual knowledge base. But in saying 'the advocate knows', actually, the advocate knows nothing. The
advocate has access to a knowledge base, and it is in the knowledge base that
the knowledge is stored. It may be an individual knowledge base, in which case
we can implement that idea that different advocates may have the different
knowledge about the same object, or it may be a shared consensual knowledge
base.
A proposition is represented as a map. So to minimise a proposition, for every value in that map, if the value is itself a map it shall be replaced by the value of the key `:id` in that map. A proposition is represented as a map. So to minimise a proposition, for every
value in that map, if the value is itself a map it shall be replaced by the
value of the key `:id` in that map.
This means that every implementation of the `wildwood.knowledge-accessor/Accessor` protocol must transduce whatever token its backing store uses as the primary key for an object to `:id` when it performs a `fetch` operation. This means that every implementation of the `wildwood.knowledge-accessor/Accessor`
protocol must transduce whatever token its backing store uses as the primary key
for an object to `:id` when it performs a `fetch` operation.
## Thoughts on the shape of a knowledge base ## Thoughts on the shape of a knowledge base
The object of building Bialowieza as a library is that we should not constrain how applications which use the library store their knowledge. Rather, knowledge accessors must transduce between the representation used by the particular storage implementation and that defined in `wildwood.schema`. However, what we've described above suggests that a hierarchical database would be a very natural fit for knowlege base data - more natural, in this case, than a relational database. The object of building Bialowieza as a library is that we should not constrain
how applications which use the library store their knowledge. Rather, knowledge
accessors must transduce between the representation used by the particular
storage implementation and that defined in `wildwood.schema`. However, what
we've described above suggests that a hierarchical database would be a very
natural fit for knowlege base data - more natural, in this case, than a
relational database.
## Prejudice, and defaults ## Prejudice, and defaults

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -5,7 +5,9 @@
:url "https://www.eclipse.org/legal/epl-2.0/"} :url "https://www.eclipse.org/legal/epl-2.0/"}
:dependencies [[org.clojure/clojure "1.8.0"] :dependencies [[org.clojure/clojure "1.8.0"]
[org.clojure/math.numeric-tower "0.0.4"] [org.clojure/math.numeric-tower "0.0.4"]
[com.taoensso/timbre "4.10.0"]] [com.taoensso/timbre "4.10.0"]
[com.novemberain/monger "3.1.0"]
[prismatic/schema "1.1.12"]]
:codox {:metadata {:doc "**TODO**: write docs" :codox {:metadata {:doc "**TODO**: write docs"
:doc/format :markdown} :doc/format :markdown}
:output-path "docs/codox" :output-path "docs/codox"

44
src/wildwood/mongo_ka.clj Normal file
View file

@ -0,0 +1,44 @@
(ns wildwood.mongo-ka
"A knowledge accessor fetching from and storing to Mongo DB.
Hierarchical databases seem a very natural fit for how we're storing
knowledge. Mongo DB seems a particularly natural fit since its
internal representation is JSON, which can be transformed to EDN
extremely naturally."
(:require [monger.core :as mg]
[monger.collection :as mc]
[wildwood.knowledge-accessor :refer [Accessor]])
(:import [com.mongodb MongoOptions ServerAddress]
[com.mongodb DB WriteConcern]
[org.bson.types ObjectId]))
;; MongoDB data items are identified by ObjectId objects. In the retrieved
;; record from MongoDB, key value is the value of a keyword `:_id` I don't
;; think there's any *in principle* reason why we should not use these objects
;; as key values - they're presumably designed to be globally unique.
;;
;; In which case, on the way down we have to set `:_id` to the value of `:id`
;; and vice versa on the way back up.
(defrecord MongoKA
;; It's not clear to me whether we need to pass both the connection and the
;; database in - it's possible that the connected database handle is
;; sufficient. The value of `:collection` is the name of the collection
;; within the database to which this accessor writes.
[connection db ^String collection]
Accessor
(fetch
[_ id]
(let [oid (cond
(instance? ObjectId id) id
(string? id) (ObjectId. id)
(keyword? id) (ObjectId. (name id)))
record (mc/find-by-id db collection oid)]
(when record
(assoc
(dissoc record :_id)
:id id))))
(store [_ id proposition]
;; don't really know how to do this and am too tired just now.
))

View file

@ -29,6 +29,11 @@
:authority ;; id of agent from whom, or rule from which, I know this. :authority ;; id of agent from whom, or rule from which, I know this.
}) })
(def preserved-keys
"Keys whose values should not be minimised during proposition minimisation"
;; TODO: actually, this may end up being just :data
(set (cons :data argument-keys)))
(defn proposition? (defn proposition?
"True if `o` qualifies as a proposition. A proposition is probably a map "True if `o` qualifies as a proposition. A proposition is probably a map
with some privileged keys, and may look something like a minimised with some privileged keys, and may look something like a minimised
@ -92,6 +97,8 @@
(number? (:confidence o)) (number? (:confidence o))
(<= -1 (:confidence o) 1))) (<= -1 (:confidence o) 1)))
(set (cons :data argument-keys))
(defn minimise (defn minimise
"Expecting that `o` is a (potentially rich) proposition, return a map identical "Expecting that `o` is a (potentially rich) proposition, return a map identical
to `o` save that for each value `v` of key `k` in `o`, if `v` is a map and `k` to `o` save that for each value `v` of key `k` in `o`, if `v` is a map and `k`
@ -110,7 +117,7 @@
{k {k
(let [v (k o)] (let [v (k o)]
(if (if
(and (not (argument-keys k)) (map? v)) (and (not (preserved-keys k)) (map? v))
(:id v) (:id v)
v))}) v))})
(keys o))) (keys o)))