|
The origins of searchTHIS | |
|
You may be wondering where this thing came from, well I'll tell ya. searchTHIS was spawned in the mind of a brilliant genius, whose
intelligence exceeds the comprehension of most human beings, then I killed him and stole his cool idea. I used to have a web
site that I wanted to make searchable so that people could easily find things. I looked to the normal places,
developer.com, books, various other script archives and web sites, and guess what?
I found jack. It was like looking for pearl in a mountain of weasel pellets. So I decided to write my own, the main problem was
that I could not do any server side programming. That made the task real hard because I was limited in what I could do. I decided to try
out Java. I had always wanted to learn it, and this finally gave me the excuse that I was looking for. I started by developing a tool
that I could use to index my web site (now known as
indexTHIS
) to create a static list of URLs, descriptions, titles, and keywords. After I had that working, I tailored javascript around the index.
I liked using JavaScript the best because it was easier to manipulate. I could place the form anywhere I wanted to, and still retained the look
and feel of a normal web page. I decided to add a Java Applet, because writing Java is fun, and I was bored. Finally I had a product that I could use,
and other people could use as well. Next came the first months of searchTHIS, not a particularly happy time. There was a lot of things going on in the first version of searchTHIS (originally called Cleary's site indexer and search engine...yick!), and unfortunately I didn't test it out sufficiently. It turns out that the original users of my product were actually guineapigs. They did well in reporting problems and concerns, so I thank them. It took a while, but I finally got many of the bugs removed, and added a few more features, and decided it was time for a new name. So I named it searchTHIS! Cool name, I know. I was sick and tired of things like <Quest> and <Agent> and stuff like that, I thought my product needed a little edge, I think it works, you could let me know if it doesn't. After several more weeks of problems, I decided to cut back on the problemsome features, and focus on the JavaScript portion. I think that this vastly improved my product. So much so to the extent that I created this cooool web site. But the new version had some problem. I added functionality to the javascript to force it to return results based on relevancy. Unfortunately, I made a serious programmatical error, I became an array abuser. Arrays hog resources, and with all of the arrays I was creating in indexTHIS, I suffered a momentary lapse of reason. I forgot that this was all client-side. The inexcusable neglect and mistreatment of arrays crippled the client, making the JavaScript useless. Nonetheless, I fixed that too, now it is better than ever. I have one more obstacle to overcome, the indexing time. The only real downside is that indexing can take a long time, depending on file size, number of files, and your computer speed. For me, it takes an average of 15 seconds to index a good size file. Right now the algorithm is the problem, and I am working to tweek that. Hopefully in the next week or so I will have one fast indexer. | |
What is THIS!? | |
searchTHIS is a tool that webmasters can use to create an index of their web files and make there site searchable. It comes with two distinct parts:
searchit.html is a file for Netscape Navigator (the most popular browser that still holds a disproportionately high percentage of the browser population and my personal favorite). It will allow visitors to your web site who are using Netscape Navigator to search for both keywords and file text, and return results based on their relevancy. Pretty cool eh? It searches the arrays built in netscape.js. db.js is a file that is used only by the Java Applet. It contains a look up table of all of the links, titles, and descriptions of the html files in your web site. db.txt is another file that is used only by the Java Applet. It contains the actual text in your html files. This file contains words that occur in the file more than the minimum score you enter when you run the indexer. Let's take an example. Say you have a file that has the word 'user' occur 5 times in it. Say it also has the word 'fun' occur 3 times in it. Now you start the indexing process by typing: where 4 is the minimum number of times a word can occur in the file for it to be included in the db.txt and netscape.js files. Because you specified 4 as the threshold, the word 'user' will be included in the files, because it occurred 5 times (and thus had a score of 5). The word fun however will not be included in the files, because the word 'fun' only occurred 3 times (and thus had a score of 3). netscape.js is a file used by the JavaScript search engine, and contains all of the same words and scores that db.txt has. All of the stuff I said above applies to this one, except for the part about netscape.js being for the JavaScript search engine. |