indexTHIS Web Site Indexer


The Indexer is a java program that you can run at the command line of your computer. It will proceed to grab the titles, keywords, descriptions, and text from your html files. It will create several files for you:
  1. search.html
  2. searchit.html
  3. db.js
  4. db.txt
  5. netscape.js
search.html is a file for Microsoft Internet Explorer. Due to the limitations on previous versions of IE, this search engine will only allow visitors to your web site who are using IE to search for keywords, not file text. It also will not return results based on relevancy.

searchit.html is a file for Netscape Navigator (the most popular browser that still holds a disproportionately high percentage of the browser population and my personal favorite). It will allow visitors to your web site who are using Netscape Navigator to search for both keywords and file text, and return results based on their relevancy. Pretty cool eh? It searches the arrays built in netscape.js.

db.js is a file that is used only by the Java Applet. It contains a look up table of all of the links, titles, and descriptions of the html files in your web site.

db.txt is another file that is used only by the Java Applet. It contains the actual text in your html files. This file contains words that occur in the file more than the minimum score you enter when you run the indexer. Let's take an example. Say you have a file that has the word 'user' occur 5 times in it. Say it also has the word 'fun' occur 3 times in it. Now you start the indexing process by typing:
java indexTHIS 4 -s
where 4 is the minimum number of times a word can occur in the file for it to be included in the db.txt and netscape.js files. Because you specified 4 as the threshold, the word 'user' will be included in the files, because it occurred 5 times (and thus had a score of 5). The word fun however will not be included in the files, because the word 'fun' only occurred 3 times (and thus had a score of 3).

netscape.js is a file used by the JavaScript search engine, and contains all of the same words and scores that db.txt has. All of the stuff I said above applies to this one, except for the part about netscape.js being for the JavaScript search engine.



Navigation