Hello all, All of this brings up another issue, namely how do you sample the web? What is the universe and what technique would you use to derive a generalizable sample? At one point I thought that if you randomly generated a set of, for example 4 letters such as "SFRW" and put them into google and then took the Nth entry in the list you would get a random sample of web content. It turns out that SFRW turns up things like: SEACOAST FEDERATION OF REPUBLICAN WOMEN Skype Feature Request Workflow Santa Fe River Watershed Etc. In thinking about it, the use of random letters would result in a lot of acronyms for groups (and thus miss, for example porn pages). Another technique would be to use a random word generator (open the dictionary to a random page and point at a word and then use it). Here you would get a lot of interesting words, but the search would depend on the language of the dictionary. Thus, neither of these approaches would work. Does anybody have any other approaches? Rich Ling
participants (1)
-
richard-seyler.ling@telenor.com