A site search engine for Cryogen with search on the client side
Find a file
2025-10-31 13:46:56 +00:00
doc Added a rough sketch of the specification and project structure. 2025-10-31 11:37:27 +00:00
src/clj/cc/journeyman/elboob Added the actual source file, which had been inexplicably omitted. 2025-10-31 13:46:56 +00:00
test/elboob Added a rough sketch of the specification and project structure. 2025-10-31 11:37:27 +00:00
.gitignore Initial commit 2025-10-31 10:28:27 +00:00
CHANGELOG.md Added a rough sketch of the specification and project structure. 2025-10-31 11:37:27 +00:00
LICENSE Initial commit 2025-10-31 10:28:27 +00:00
project.clj Added a rough sketch of the specification and project structure. 2025-10-31 11:37:27 +00:00
README.md Added a rough sketch of the specification and project structure. 2025-10-31 11:37:27 +00:00

elboob

A site search engine for Cryogen with search on the client side

Design intention

This project is intended to be in two parts:

The compiler

A Clojure function which scans a list of directories of Markdown files, and produces a map which keys each lexical token occurring in each file (with Markdown formatting, common words, punctuation etc excepted) to a map which keys the relative file path of each file in which the token occurs to the frequency the token occurs within the file.

Thus, supposing we had one file, with the path name content/md/posts/aquarius.md with the content

The Age of Aquarius

This is the dawning of the Age of Aquarius.

Then the output should be

{"age" {"content/md/posts/aquarius.md" 2}
 "aquarius" {"content/md/posts/aquarius.md" 2}
 "dawning" {"content/md/posts/aquarius.md" 1}}

This map is then stored in a file elboob.edn in the root directory of the Cryogen public output. Whether the source path name (e.g. content/md/posts/) should be converted to the target pathname (e.g. /blog/posts-output/) at compile time or at search time is something I'll decide later.

The searcher

The searcher is a little Clojurescript function which, given a sequence of search terms, will read the elboob.edn file, will produce a web page showing a list of files which contain one or more of those search terms, ordered by the product of the number of occurences of each word in the file.

Implementation

Has not started yet.

License

Copyright © 2025 Simon Brooke. Licensed under the GNU General Public License, version 2.0 or (at your option) any later version.