actually this is the way that the search engine we developed at cddc works, so i find it interesting to see another one just like it the method is simple: take any text strip html put the text into a table(1) row with an index number (TID) parse the text into words put each of those words into a new table(2) with new row with TID you can build an index from that(2) easily hint: you select unique using sql you use select like using sql then you parse the text into html linked to the individual words in the index and insert that into a column in table(1) then you can display the text as a hyperlinked index. which you can display our(center for digital discourse and culture, myself and my assistants) innovations on this design 1. we added to table2 the ability to add definitions to individual word entries, so that if you click a word to find where it is indexed you could put in a definition if one was not present or read the definition that is present. The definitions are all indexed also. 2. we added a table3 combined up of 2 to 4 word phrases which speeds up searching + allows you to find proper names much easier and I am currently developing code that will clean table 3 based upon what people tend to search on, ie table 4 3. save all search strings and first 3 answers in table 4 4. we began stripping common words from the base parser for table 2, (the, is, are, an, so, that, etc) this speeds parsing and indexing immensely. the basic code for this will be released in a few weeks on sourceforge search for cddc. We're releasing the complete initial codebase. and I guess i should probably generate some sort of paper out of this:) On Thursday, February 28, 2002, at 04:26 AM, Zunt@aol.com wrote:
I've not run across this method before, and thought folks on this list might enjoy puzzling over it.
http://www.ugcs.caltech.edu/~harel/lyrics.html
The website contains a collection of texts (popular song lyrics). Having made a selection from the contents, you can click on various linked words within the text (not all possible words are linked). That action triggers (a) enumeration of the texts in the library that contain the target word, and (b) a hyperlinked index connecting you back to those available texts. Each instance of target word use appears in the index list.
It looks to me like quite a bit of HTML page generation is done automatically via scripting on the server side.
Cheers,
Bob Briggs Westport, MA
_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l
jeremy hunsinger jhuns@vt.edu on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy www.dromocracy.com