Minor improvements to indexing.
This commit is contained in:
		
							parent
							
								
									5e33f2c815
								
							
						
					
					
						commit
						21b6bfd67e
					
				
					 4 changed files with 42 additions and 23 deletions
				
			
		
							
								
								
									
										15
									
								
								README.md
									
										
									
									
									
								
							
							
						
						
									
										15
									
								
								README.md
									
										
									
									
									
								
							| 
						 | 
				
			
			@ -1,6 +1,12 @@
 | 
			
		|||
# elboob
 | 
			
		||||
 | 
			
		||||
A site search engine for Cryogen with search on the client side
 | 
			
		||||
A site search engine for [Cryogen](http://cryogenweb.org/) with search on the client side
 | 
			
		||||
 | 
			
		||||
## Justification
 | 
			
		||||
 | 
			
		||||
Left, of course.
 | 
			
		||||
 | 
			
		||||
More seriously `elboob` is as near as I can get to an inversion of Google.
 | 
			
		||||
 | 
			
		||||
## Design intention
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -32,7 +38,12 @@ Then the output should be
 | 
			
		|||
 | 
			
		||||
 ## Implementation
 | 
			
		||||
 | 
			
		||||
 Has not started yet.
 | 
			
		||||
 Is at an early stage. I have a working indexer, which conforms to the specification given above. There are problems with it:
 | 
			
		||||
 | 
			
		||||
 1. It contains many many repetitions of long file path names, which results in a large data size (although it make it efficient to search);
 | 
			
		||||
 2. It doesn't contain human readable metadata about the files, which, given this is Cryogen and the files have metadata headers, it easily could.
 | 
			
		||||
 | 
			
		||||
 I could assign a gensym to each file path name, store that gensym in the main index, add a separate dictionary map entry to the index which translated those gensyms into the full file paths. That would substantially reduce the file size without greatly increasing the cost of search. 
 | 
			
		||||
 | 
			
		||||
 ## License
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue