Hello all, All of this brings up another issue, namely how do you sample the web? What is the universe and what technique would you use to derive a generalizable sample? At one point I thought that if you randomly generated a set of, for example 4 letters such as "SFRW" and put them into google and then took the Nth entry in the list you would get a random sample of web content. It turns out that SFRW turns up things like: SEACOAST FEDERATION OF REPUBLICAN WOMEN Skype Feature Request Workflow Santa Fe River Watershed Etc. In thinking about it, the use of random letters would result in a lot of acronyms for groups (and thus miss, for example porn pages). Another technique would be to use a random word generator (open the dictionary to a random page and point at a word and then use it). Here you would get a lot of interesting words, but the search would depend on the language of the dictionary. Thus, neither of these approaches would work. Does anybody have any other approaches? Rich Ling