Help For Java Indexer V1.5

Welcome to ExNet's Java-based real-time index-search tool, designed to make instant searches in a site's web pages.

This tool does not require a CGI server, and uses only flat files---ideal if you are an end user behind a firewall, ideal if you run a Web site in space you buy from a service provider and they do not provide CGI services.

How To Use the Search Tool

When you first load a page containing the search applet, the applet will attempt to load up the index associated with that applet (note that each applet can load a different index). It will also have to load up the classes for the applet. Depending on the speed of you network connection and the size of the index that will take anything from a few seconds to a couple of minutes.

In most browsers, if you revisit the page with the applet on it without quitting the browser in the mean time, you will not need to reload the index.

Type some words or a phrase that you want to look for into the ``Search for'' box (which is yellow where the browser permits). Either hit the RETURN or ENTER key to start a search immediately, or wait a little while and a search will start automatically. The tool will find the documents it thinks best matches those search words, and will list them in the ``Results'' window, best match first.

Double-click on the document you want to look at. The tool scores first by the number of the search words that were found in each document, and then by how rare those words are, ie how good they should be at picking out the documents you want to see. Thus the word ``the'' generally won't add much to how good the tool thinks a document is if you include it in your search words, because for typical English-text documents it is very common.

Good Documents and Bad Documents

Documents that the tool thinks are very likely to be good on the basis of the words they contain will be marked with three stars (``***'') at the start of the entry, down to one star for entries that are marginal, and just a question mark (``?'') for documents that don't match enough of your search words. This means that sometimes documents at the top of the list (that match most words) will not have the most stars (because the words matched may not be very rare and thus good at selecting between documents).

How Many Documents Are In The Index?

You can see the number of different documents or document sections that the index was generated from, and the number of different words in the index, in the bottom line of the search applet once the index has loaded. You can also see a status indicator that shows when the tool is busy, and where possible how close it is to finishing whatever it is doing. You will usually see this indicator in use when the index is being loaded or a search is being done. This tool is multi-threaded and so may be doing several things at once for you!

Auto Cue

If you leave the ``Search for'' box blank for a while you will find that the applet is usually set up to cue you with some interesting searches you might do, or brief instructions. If this ``auto cue'' system is being used, text will appear in the ``Search for'' box as if typed in there, and the system will search for the words there as normal. You can disable this by leaving some text in the search box, or by starting to edit any of the search text that the auto cue system generated.

Which Words and How Many?

As you are typing in your search words you will see text appearing in the ``Doc counts'' box. (You may have to scroll this box to see all the text in it if your search has many words.) The first part of this box shows the two closest words in the index on either side of the last search word you have typed in. If your search word is not in the index it will appear between these two surrounded by ``?'' question marks. This will help you chose which words to look for. After that, you will see the word ``FOUND:'' followed by a list for words. As for the words in the first part of the box, each is followed by a colon (``:'') and a number, which is the number of documents the word appeared in. You will notice that the words are all converted to lower-case, and that duplicates are removed; words that aren't in the index are not shown at all. Very long words or stings of digits are broken up into shorter bits. All this helps the indexing mechanism help you find words that you are looking for.

Jumping To Your Chosen Document

If you want to look at one of the documents listed in the search box, double-click on its line. If the browser is able to, the selected document will be shown in a new browser window; on some older browsers the new document will be displayed in the window the applet was in.

Tuning Your Search

If you use the search tool a lot you may wish to tune its behaviour. Press the ``Control Panel'' button to get to the control panel, and the ``Search'' button to get back to the normal search interface.

Technical Details and Miscellany

Changes Since Previous Versions

Changes Since V1.4

The main changes from V1.4 to V1.5 are:

Client Requirements and Environment

This tool should work with Netscape 2.01 or later, or Internet Explorer 3.0 or later, or any other Java interpreter that can run the output of Sun's JDK 1.0.1 or 1.0.2 javac compiler. Not all parts of it will be functional in all viewers or with all interpreters.

Credits and Thanks

Many of the techniques used in this index are to be found in the excellent book ``Managing Gigabytes,'' Witten, IH, Moffat, A, Bell, TC, Van Nostrand Reinhold 1994, ISBN 0-442-01863-0. Most of the classes have been written by Damon Hart-Davis; some have been written by Caroline Skene.

Use of This Code by Site Maintainer and End User

Note that the Java classes and code to build these indices is available from ExNet, please mail us for the price. The viewer (.class) code is free for non-commercial use, providing you tell us who you are and what you are using it for. De-compilation is not permitted. All other rights are reserved.

This software is supplied as-is and as end-user of the free parts of this software we provide you with no warranties of any kind. Our liability to a Web-site provider is limited to at most the price paid for this product to us.

Web site provider: you may modify this document for display on your site, providing the alterations are reasonable and providing you note the source of the original document and provide a link back to ExNet where possible.


ExNet's home page.
Sales queries to info@exnet.com, technical queries to sysadmin@exnet.com.
All code and documentation copyright DHD/EL 1995--1997.