Added first sketch of ignorable words
This commit is contained in:
		
							parent
							
								
									e5875a2a19
								
							
						
					
					
						commit
						f2fc1acc80
					
				
					 3 changed files with 108 additions and 1 deletions
				
			
		
							
								
								
									
										105
									
								
								resources/ignorable-words.en.edn
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										105
									
								
								resources/ignorable-words.en.edn
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,105 @@
 | 
			
		|||
;; list of English language words that should not be indexed.
 | 
			
		||||
;; taken from the first hundred words in [Peter Norvig's analysis of the 
 | 
			
		||||
;; frequency of English words](https://norvig.com/ngrams/count_1w.txt);
 | 
			
		||||
;; I've then commented out from the list those words which, although
 | 
			
		||||
;; common, I think it may be reasonable for people to search for.
 | 
			
		||||
["the"
 | 
			
		||||
"of"
 | 
			
		||||
"and"
 | 
			
		||||
"to"
 | 
			
		||||
"a"
 | 
			
		||||
"in"
 | 
			
		||||
"for"
 | 
			
		||||
"is"
 | 
			
		||||
"on"
 | 
			
		||||
"that"
 | 
			
		||||
"by"
 | 
			
		||||
"this"
 | 
			
		||||
"with"
 | 
			
		||||
"i"
 | 
			
		||||
"you"
 | 
			
		||||
"it"
 | 
			
		||||
"not"
 | 
			
		||||
"or"
 | 
			
		||||
"be"
 | 
			
		||||
"are"
 | 
			
		||||
"from"
 | 
			
		||||
"at"
 | 
			
		||||
"as"
 | 
			
		||||
"your"
 | 
			
		||||
"all"
 | 
			
		||||
"have"
 | 
			
		||||
"new"
 | 
			
		||||
"more"
 | 
			
		||||
"an"
 | 
			
		||||
"was"
 | 
			
		||||
"we"
 | 
			
		||||
"will"
 | 
			
		||||
"home"
 | 
			
		||||
"can"
 | 
			
		||||
"us"
 | 
			
		||||
"about"
 | 
			
		||||
"if"
 | 
			
		||||
"page"
 | 
			
		||||
"my"
 | 
			
		||||
"has"
 | 
			
		||||
"search"
 | 
			
		||||
"free"
 | 
			
		||||
"but"
 | 
			
		||||
"our"
 | 
			
		||||
"one"
 | 
			
		||||
"other"
 | 
			
		||||
"do"
 | 
			
		||||
"no"
 | 
			
		||||
;; "information"
 | 
			
		||||
"time"
 | 
			
		||||
"they"
 | 
			
		||||
"site"
 | 
			
		||||
"he"
 | 
			
		||||
"up"
 | 
			
		||||
"may"
 | 
			
		||||
"what"
 | 
			
		||||
"which"
 | 
			
		||||
"their"
 | 
			
		||||
"news"
 | 
			
		||||
"out"
 | 
			
		||||
"use"
 | 
			
		||||
"any"
 | 
			
		||||
"there"
 | 
			
		||||
"see"
 | 
			
		||||
"only"
 | 
			
		||||
"so"
 | 
			
		||||
"his"
 | 
			
		||||
"when"
 | 
			
		||||
;; "contact"
 | 
			
		||||
"here"
 | 
			
		||||
;; "business"
 | 
			
		||||
"who"
 | 
			
		||||
"web"
 | 
			
		||||
"also"
 | 
			
		||||
"now"
 | 
			
		||||
;; "help"
 | 
			
		||||
"get"
 | 
			
		||||
"pm"
 | 
			
		||||
"view"
 | 
			
		||||
;; "online"
 | 
			
		||||
"c"
 | 
			
		||||
"e"
 | 
			
		||||
"first"
 | 
			
		||||
"am"
 | 
			
		||||
"been"
 | 
			
		||||
"would"
 | 
			
		||||
"how"
 | 
			
		||||
"were"
 | 
			
		||||
"me"
 | 
			
		||||
"s"
 | 
			
		||||
;; "services"
 | 
			
		||||
"some"
 | 
			
		||||
"these"
 | 
			
		||||
"click"
 | 
			
		||||
"its"
 | 
			
		||||
"like"
 | 
			
		||||
;; "service"
 | 
			
		||||
"x"
 | 
			
		||||
"than"
 | 
			
		||||
"find"]
 | 
			
		||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue