Elizabeth, Scrive "Van-Couvering,EJ (pgr)" <E.J.Van-Couvering@lse.ac.uk>:
While it is true that Google doesn't edit contents, from my research, I think it is safe to say there is a lot underneath this "efficient indexing" of websites. Each search engine provider (of which there are 3-4 major ones - Google, yahoo, Microsoft, and AskJeeves) is striving not only to produce the most efficient index of results but the most "relevant" index of results. This is a tricky issue - is a neo-Nazi site the most relevant hit for the search "Jew"?
First off, the discussion started with "folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right?" I took issue with that statement, because I know in many circumstances of no better way to test everyday language use. I was thus defending the *number* of hits rather than their *rankings* as a good indicator (there are exceptions, obviously, pornographic language will be overrepresented). But, I would also vouch, albeit with more caution, for the ranking: If the ranking would not suit most internet users, google would not have such a success. Even though the path dependency argument would go some way, it would do just that: to go *some* way: I don't remember what was the first "search engine" I used, but I do remember switching from Webcrawler to altavista some time in March 1997 and from altavista to Google in late 1999, even though I was already quite content with both Webcrawler and altavista at the time. Thus, google appears to churn out the most relevant hits to most internet users, which of course, are not a random sample of the global population. Now, let's have a look at the query "Jew," which because of its ambiguity appears pretty non-sensical to me as a search other than for research purposes, investigating, what is associated with the word "Jew." Well, my guess is, you will find many anti-Semitic sites for four reasons (you'd wish these were neo-Nazi sites, but anti-Semitism is not at all confined to the Nazi scene): 1) Anti-Semitism is a large and growing global phenomenon. 2) Anti-Semitic statements on the web are overrepresented. 3) Infamous works such as "The Eternal Jew" and "The International Jew" contain the word "Jew" in their title. Partly the reason for this choice of wording is: 4) Anti-Semites will have a propensity to use the word "Jew" in singular, as for them Jewishness is an indelible personal attribute, which defines the character of a person. In contrast, other people will more often speak of "Jewish" or, maybe sometimes, "Jews". "I'm Jewish" is certainly be the preferred wording over "I'm a Jew." And, sure, enough, Google churns out for UK-IP queries (I hate this national(ist) bias "feature" of google to outguess me, from which country and in which language I would like to have my research results) a revolting Anti-Semitism site as first hit, the Wikipedia entry as second hit, an academic site as third hit, a conscious effort to knock off the anti-Semitic site as first hit in fourth place, and Henry Ford's tractate in fifth. All these seem very relevant hits to me, if your question is "what is associated with the word 'Jew'." Interestingly enough, google felt compelled to explain their research results along pretty much the same lines as I just did: http://www.google.com/explanation.html
Or, more commonly, when someone searches for "apple" do they mean the fruit or the computer?
The homonym problem cannot be solved through search engine diversification.
It also means, as any search engine optimiser will tell you, that there is a lot of very active blacklisting of sites which are perceived to be fraudulent. Therefore I think that a concentrated search market is likely to be a bad thing: we may want choice in what kind of results we think are relevant.
Well, most users of google have decided that they rather not wade through zillions of bait sites from the porno industry. Why should machine-generated webpages get the same relevance score as human-generated sites? I am not a believer in all market solutions, but in this case the market solution seems to yield the best results.
Certainly those who live outside the major advertising markets are finding that their versions of the internet are not particularly well-searched, as commerce drives the indexing efforts of all the major engines.
Fair enough, but a proliferation of the market would likely not alleviate, but aggravate that problem, as it would mean a diversification (read: compartmentalization) of the market. Thomas (not affiliated with Google in any way) -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html