Minor improvements to indexing.

This commit is contained in:
Simon Brooke 2025-10-31 18:46:08 +00:00
parent 5e33f2c815
commit 21b6bfd67e
4 changed files with 42 additions and 23 deletions

View file

@ -1,6 +1,12 @@
# elboob
A site search engine for Cryogen with search on the client side
A site search engine for [Cryogen](http://cryogenweb.org/) with search on the client side
## Justification
Left, of course.
More seriously `elboob` is as near as I can get to an inversion of Google.
## Design intention
@ -32,7 +38,12 @@ Then the output should be
## Implementation
Has not started yet.
Is at an early stage. I have a working indexer, which conforms to the specification given above. There are problems with it:
1. It contains many many repetitions of long file path names, which results in a large data size (although it make it efficient to search);
2. It doesn't contain human readable metadata about the files, which, given this is Cryogen and the files have metadata headers, it easily could.
I could assign a gensym to each file path name, store that gensym in the main index, add a separate dictionary map entry to the index which translated those gensyms into the full file paths. That would substantially reduce the file size without greatly increasing the cost of search.
## License