Re: [Air-l] counting google hits
Thomas, Of course all samples have biases, but isn't it incumbent upon us to better understand them -- particularly in the case of Google, precisely because it is fast becoming "all that we have", that is a default choice for information retrieval, not only for everyday users, but also for students and Internet researchers? Greg Elmer ----- Original Message ----- From: Thomas Koenig <T.Koenig@lboro.ac.uk> Date: Wednesday, March 2, 2005 8:54 pm Subject: Re: [Air-l] counting google hits
Citeren elijah wright <elw@stderr.org>:
[What's wrong with using Google stats?]
because people assume that all texts that are available are represented,> which according to the google people they are *not*.
Fair enough, but what is your alternative corpus? Most traditional corporahave a bias away from everyday language to journalistic and/or literary writings. Sometimes these bias' may not matter, some other times, they might be even desirable, but at times google is the better choice, even if imperfect.
in other words, the sample that you are pulling numbers from is neither> complete nor perfect - so your results won't be either.
Who gets unbiased random samples? No-one, not even NORC, who are pretty good at it. Does that invalidate *all* statistical results? Of course not. Don't get me wrong, I am all for careful random sampling, but if I cannot get it, I might, under some circumstances, resort to biased samples, rather than to not get any sample at all.
do you understand what google does well enough (details of the algorithm,> et cetera) to know what the weaknesses are? oh, you say they haven't published enough information for you to know? that's what i thought. :|
I do not know, how google indexes (I have a faint idea, though), but for many practical purposes, it simply does not matter, as long as I do notsuspect a bias of exclusions of websites, which are *systematically related* to the topic I am researching.
Would I rather have a random sample of all human-generated websits, preferably with the vital stats of their authors attached? You bet. I just won't get it. So I am taking the next best thing, aka Google.
I am afraid, this is how your argumentation sounds to me. Why should it be wrong to use the number of google hits under all circumstances?
i think your tone is pretty crass.
Funny, that's what I thought of yours, that's why I chose to use *your*words. You probably know that it's sometimes difficult to discern the tone when you have no cues other then some ASCII strings.
If I want to show that Canada is better known than Vanuatu
(http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu),> > why would the comparison of google hits be inadmissable? (There are a
number of reasons, why the "Vunuatu" hits are inflated, but that is of no concern here).
popularity of a term is one of the few instances in which comparative> occurrence vis a vis the google corpus *might* be useful. it would depend on your question, and whether the data available from the particular google server you're connected to is appropriate to answering it.
Of course, it always depends on what you want to do, but that's a far stretch of your wholesale rejection of using Google hits for any kind of research:
"folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right?"
Thomas
-- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
Greg Elmer, PhD Bell Globemedia Research Chair Rogers Communications Centre/School of Radio-TV Arts Ryerson University 350 Victoria Street, Toronto, Ontario Canada M5B 2K3 416-979-5282 _______________________________________________ Co-Editor, Space and Culture: An International Journal of Social Spaces http://www.carleton.ca/space/
Innovative use of google hits: Some one I know uses google when checking spelling and foreign language phrasing -- if a spelling turns up 150 google hits one way, and 350,000 the other way, he figures the second spelling was correct. Same when he's not sure exactly how a phrase goes in another language. More hits = probably grammatical. Nancy -- Nancy Baym http://www.ku.edu/home/nbaym Communication Studies, University of Kansas Bailey Hall, 1440 Jayhawk Blvd., Room 102, Lawrence, KS 66045-7574, USA Association of Internet Researchers: http://aoir.org
I think pretty much everyone does that. You can also use the syntax define:word in the google web entry and it will give you a series of definitions. Get Google Hacks 2nd Edition (O'Reilly); it has an enormous amount of information on Google and the various ways to access/modify it. - Alan On Thu, 3 Mar 2005, Nancy Baym wrote:
Innovative use of google hits:
Some one I know uses google when checking spelling and foreign language phrasing -- if a spelling turns up 150 google hits one way, and 350,000 the other way, he figures the second spelling was correct. Same when he's not sure exactly how a phrase goes in another language. More hits = probably grammatical.
Nancy
-- Nancy Baym http://www.ku.edu/home/nbaym Communication Studies, University of Kansas Bailey Hall, 1440 Jayhawk Blvd., Room 102, Lawrence, KS 66045-7574, USA Association of Internet Researchers: http://aoir.org _______________________________________________ The Air-l-aoir.org@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://aoir.org/airjoin.html
nettext http://biblioteknett.no/alias/HJEMMESIDE/bjornmag/nettext/ http://www.asondheim.org/ WVU 2004 projects: http://www.as.wvu.edu/clcold/sondheim/ http://www.as.wvu.edu:8000/clc/Members/sondheim Trace projects http://trace.ntu.ac.uk/writers/sondheim/index.htm
I've been reading about google hits and checking spelling and students using google for research (which I find somewhat troublesome), and I'd like to recommend EPIC to you. Check it out here: http://www.robinsloan.com/epic/ EPIC is essentially an 8 minute movie (it takes a long time to load on dial-up) about the future of our information media. For example, it projects the merging of google and amazon and the consequences this will have for "personalization" of information. If you are concerned at all about critical thinking and evaluation of online information sources, I recommend this highly. Make sure you watch the whole thing and then share it with your more advanced students. Ulla ---------------------------------------------------- Ulla Bunz Assistant Professor Department of Communication Rutgers University 4 Huntington Street New Brunswick, NJ 08901 Email: bunz@scils.rutgers.edu ----------------------------------------------------
At 15:11 04/03/2005, Ulla Bunz wrote:
I've been reading about google hits and checking spelling and students using google for research (which I find somewhat troublesome), and I'd like to recommend EPIC to you. Check it out here: http://www.robinsloan.com/epic/
It's always difficult to predict the future, but this movie contains too many "good ol' times" bias' in my view. I like the NYT, but undoubtly Google (even though it's a quasi-monopoly) is a more "democratic" news source than any newspaper, let alone any broadcaster. Also, the NYT currently sells about 1.1 Million copies, The Sun has 3.4 Million in a country a quarter the size of the US. You figure, if the Sun is a desirable news source, which supports democratic citizenship. Need I spell another three-letter-word: FOX. The film celebrates "journalistic professionalism", although many scholars (Tuchman, Gans, Gitlin immediately spring to mind) have shown that these professional norms systematically exclude certain news sources, issues and events. Why would a search engine, which indexes a much wider variety of sources be more rather than less biased? The film also contains a number of factual errors, such as: "Google's algorithm is based on amazon's" The principle of the evaluation of links certainly preceeds amazon, and Google's principle sorting mechanism is very different from amazon's (in 1998), which then did have recommendations, but did not order search results through link evaluation and clustering techniques. "Google News is edited entirely by computers" That's a semantic trick. Google News is based on newswires and other news providers, which do have human editors. "All the news on EPIC are sensationalist". Current print/broadcast media also prefer event-centred coverage over issue-oriented coverage (see, eg., Iyangar's "Is anyone responsible"). I cannot see, why "issue-oriented reporting" should be the domain of journalists only? Au contraire, there are many citizens with particular policy interests, which offer much more thematic news than any news wire does. Unlike the press, these citizens are not so much driven by market forces, which lead journalists in the traditional mass media to "eposidic" reporting. A few words on media market concentration the movie loathes so much. In general, of course, market concentration is a bad thing for product variety (read: range of opinions in communication terms). But Google does not edit the contents (safe for a few webpaes on Google itself produces), but just indexes them quite efficiently. It provides thus the infrastructure to effiently access all sorts of news sites. I know of no goverment or open-surce project, which does it more "balanced". Just my 2c. Thomas thomas koenig, ph.d. department of social sciences, loughborough university http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
participants (5)
-
Alan Sondheim -
Greg Elmer -
Nancy Baym -
Thomas Koenig -
Ulla Bunz