Q: From a librarian: "Dear Barry: I do not know much about this debate, but I have a question for you: Please tell me how do you find in google the hits? what is the procedure? i.e. if I would like to know how many time the word Caribbean appears in Google, how will I proceed? Thanks for your help! Nelly" A: 1. Go to www.google.com 2. type in Carribean 3. Look at the light blue web bar on top of the first list of hits. It will show you the approximate number of hits: Example: "Results 1 - 10 of about 38,100,000 for caribbean" Barry _____________________________________________________________________ Barry Wellman Professor of Sociology NetLab Director wellman at chass.utoronto.ca http://www.chass.utoronto.ca/~wellman Centre for Urban & Community Studies University of Toronto 455 Spadina Avenue Toronto Canada M5S 2G8 fax:+1-416-978-7162 To network is to live; to live is to network () ASCII ribbon campaign -- don't use HTML email /\ _____________________________________________________________________
1. Go to www.google.com 2. type in Carribean 3. Look at the light blue web bar on top of the first list of hits. It will show you the approximate number of hits: Example: "Results 1 - 10 of about 38,100,000 for caribbean"
folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right? this is like reading a student paper that says: "Merriam Webster's dictionary says that X is defined as Y. Therefore, Z.", accompanied by no further argumentation. Possibly true, but pretty hole-y logic. --elijah
Citeren elijah wright <elw@stderr.org>:
1. Go to www.google.com 2. type in Carribean 3. Look at the light blue web bar on top of the first list of hits. It will show you the approximate number of hits: Example: "Results 1 - 10 of about 38,100,000 for caribbean"
folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right?
Wrong. What's wrong with using the vast internet resources as a quasi-corpus for natural languages (if you avoid certain pitfalls, which I alluded to in my last message)? Corpora such as WordNet (http://wordnet.princeton.edu/) or Wortschatz (http://wortschatz.uni-leipzig.de/) are also far from being perfect (aka totally unbiased).
this is like reading a student paper that says: "Merriam Webster's dictionary says that X is defined as Y. Therefore, Z.", accompanied by no further argumentation. Possibly true, but pretty hole-y logic.
I am afraid, this is how your argumentation sounds to me. Why should it be wrong to use the number of google hits under all circumstances? If I want to show that Canada is better known than Vanuatu (http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu), why would the comparison of google hits be inadmissable? (There are a number of reasons, why the "Vunuatu" hits are inflated, but that is of no concern here). Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right?
Wrong. What's wrong with using the vast internet resources as a quasi-corpus for natural languages (if you avoid certain pitfalls, which I alluded to in my last message)?
because people assume that all texts that are available are represented, which according to the google people they are *not*. in other words, the sample that you are pulling numbers from is neither complete nor perfect - so your results won't be either. do you understand what google does well enough (details of the algorithm, et cetera) to know what the weaknesses are? oh, you say they haven't published enough information for you to know? that's what i thought. :|
I am afraid, this is how your argumentation sounds to me. Why should it be wrong to use the number of google hits under all circumstances?
i think your tone is pretty crass.
If I want to show that Canada is better known than Vanuatu (http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu), why would the comparison of google hits be inadmissable? (There are a number of reasons, why the "Vunuatu" hits are inflated, but that is of no concern here).
popularity of a term is one of the few instances in which comparative occurrence vis a vis the google corpus *might* be useful. it would depend on your question, and whether the data available from the particular google server you're connected to is appropriate to answering it. --elijah
Citeren elijah wright <elw@stderr.org>: [What's wrong with using Google stats?]
because people assume that all texts that are available are represented, which according to the google people they are *not*.
Fair enough, but what is your alternative corpus? Most traditional corpora have a bias away from everyday language to journalistic and/or literary writings. Sometimes these bias' may not matter, some other times, they might be even desirable, but at times google is the better choice, even if imperfect.
in other words, the sample that you are pulling numbers from is neither complete nor perfect - so your results won't be either.
Who gets unbiased random samples? No-one, not even NORC, who are pretty good at it. Does that invalidate *all* statistical results? Of course not. Don't get me wrong, I am all for careful random sampling, but if I cannot get it, I might, under some circumstances, resort to biased samples, rather than to not get any sample at all.
do you understand what google does well enough (details of the algorithm, et cetera) to know what the weaknesses are? oh, you say they haven't published enough information for you to know? that's what i thought. :|
I do not know, how google indexes (I have a faint idea, though), but for many practical purposes, it simply does not matter, as long as I do not suspect a bias of exclusions of websites, which are *systematically related* to the topic I am researching. Would I rather have a random sample of all human-generated websits, preferably with the vital stats of their authors attached? You bet. I just won't get it. So I am taking the next best thing, aka Google.
I am afraid, this is how your argumentation sounds to me. Why should it be wrong to use the number of google hits under all circumstances?
i think your tone is pretty crass.
Funny, that's what I thought of yours, that's why I chose to use *your* words. You probably know that it's sometimes difficult to discern the tone when you have no cues other then some ASCII strings.
If I want to show that Canada is better known than Vanuatu
(http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu),
why would the comparison of google hits be inadmissable? (There are a number of reasons, why the "Vunuatu" hits are inflated, but that is of no concern here).
popularity of a term is one of the few instances in which comparative occurrence vis a vis the google corpus *might* be useful. it would depend on your question, and whether the data available from the particular google server you're connected to is appropriate to answering it.
Of course, it always depends on what you want to do, but that's a far stretch of your wholesale rejection of using Google hits for any kind of research: "folks realize that using the "number of hits returned on google" is a hilarious bad way to prove a point -- right?" Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
Citeren Barry Wellman <wellman@chass.utoronto.ca>:
Q: From a librarian: "Dear Barry: I do not know much about this debate, but I have a question for you: Please tell me how do you find in google the hits? what is the procedure? i.e. if I would like to know how many time the word Caribbean appears in Google, how will I proceed? Thanks for your help! Nelly"
A: 1. Go to www.google.com 2. type in Carribean 3. Look at the light blue web bar on top of the first list of hits. It will show you the approximate number of hits: Example: "Results 1 - 10 of about 38,100,000 for caribbean"
I distinctly remember an article, which reports that the google hits estimate is extremely unreliable for larger numbers (>10K or so) of hits, alas I cannot find the text. I would be grateful, if anyone could point me to the reference. Also, if you use google as a corpus substitute, beware that there are many non-human-generated webpages, which seriously can skew your results. See: http://itre.cis.upenn.edu/~myl/languagelog/archives/000194.html Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
A lot of technical translators use Google in all kinds of interesting ways - they've become quite sophisticated at it over the years, with all sorts of tricks. Any corpus is going to be biased. For example, if you're a medical translator, you'll have a shelf full (or various shevles full) of bilingual and monolingual medical dictionaries that will not include a huge number of words used in the current medical research literature, owing to the pace at which research moves the subject on (and also owing to the fact that dictionaries don't tend to focus on the frontiers of a subject). A five-year-old medical dictionary, however good, is next to useless if you're translating research papers for current publication in places like the Lancet, the BMJ, JAMA etc. It's actually far better sometimes to use Google and other search engines to see who is writing what where, and whether it's only French websites - for example - or Romance-language sites, using a particular expression (in other words, a highly unreliable expression), or whether it's also on the main English-speaking research websites (and particularly in journal article titles). If you're translating peer reviews or peer-reviewed papers, it's often the only way to track down certain novel terms. I think technical translators would be among the first to vote for open publishing of academic journal articles, which would aid enormously in getting rid of some of the biases they face with current online information. Louise Ferguson
perhaps to sort of add to the google hits discussion, I bring you googlefight, and I've prearranged a little contest between internet research and internet studies http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=internet+studies and one between internet research and sociology: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=sociology and one between internet research and communication research: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=communication+research we beat them all hands down.... granted of course.... mostly you can orchestrate just about any result.... jeremy hunsinger jhuns@vt.edu www.cddc.vt.edu jeremy.tmttlt.com www.tmttlt.com () ascii ribbon campaign - against html mail /\ - against microsoft attachments
At 18:46 04/03/2005, you wrote:
perhaps to sort of add to the google hits discussion, I bring you googlefight, and I've prearranged a little contest between internet research and internet studies
...
and one between internet research and sociology: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=sociology
The proper query would of course have been http://www.googlefight.com/index.php?lang=en_GB&word1=%22internet+research%2... or, http://tinyurl.com/4mufc respectively, which soiology wins by a factor of 10. Still an underestimation, because "internet research" might refer to both research about the Internet or research conducted via the internet, but getting there.
we beat them all hands down.... granted of course.... mostly you can orchestrate just about any result....
No, you cannot. You have virtually no influence on the results of googlefight, which, of course, doesn't mean that you cannot fool it into results that *appear* nonsensical. Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
The proper query would of course have been
http://www.googlefight.com/index.php? lang=en_GB&word1=%22internet+research%22&word2=sociology
or,
no, that is what you think would be a proper query, but the query is not the one that I wanted to have. there is no 'propriety' to either. it is a humorous toy.
respectively, which soiology wins by a factor of 10. Still an underestimation, because "internet research" might refer to both research about the Internet or research conducted via the internet, but getting there.
we beat them all hands down.... granted of course.... mostly you can orchestrate just about any result....
No, you cannot. You have virtually no influence on the results of googlefight, which, of course, doesn't mean that you cannot fool it into results that *appear* nonsensical.
either way is nonsensical. the point was to have fun, not to make an argument as to which is larger or more important or to reveal disciplinary biases, though.... several seem to have come out after the fact :) jeremy hunsinger jhuns@vt.edu www.cddc.vt.edu jeremy.tmttlt.com www.tmttlt.com () ascii ribbon campaign - against html mail /\ - against microsoft attachments
Interestingly, "this thread" and "this discussion" both beat out "taking a nap", "what's on tv", and even "television". There goes my chance for a humorous response. :( -eg
But finally, I fought the law and I won! :-) Ellis Godard wrote:
Interestingly, "this thread" and "this discussion" both beat out "taking a nap", "what's on tv", and even "television". There goes my chance for a humorous response. :(
-eg
_______________________________________________ The Air-l-aoir.org@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://aoir.org/airjoin.html
-- -- Laura J. Little, Ed.D. Instructional Technologist/Director, Title III Grant Marietta College Marietta, OH littlea@marietta.edu (740)376-4815 http://www.marietta.edu/~littlea/ "Education is not the filling of a pail, but the lighting of a fire." William Butler Yeats
Actually, you do have influence on google fight results. I noticed, for example, that Burger King wins by a huge margin compared to McDonalds, except, they spelled it MacDonalds, and so unless you know, you'll be getting the wrong impression. Same with "alfa" versus beta (instead of "alpha"). If someone just reads the results on the radio, or someone isn't that great at spelling Greek letters, they won't notice. The quotation marks mentioned by Lauren are just another example of how you can influence exactly what google searches for. Ulla
Scrive Ulla Bunz <bunz@scils.rutgers.edu>:
Actually, you do have influence on google fight results.
If so, please influence its results in a way that Kerry beats Bush.
I noticed, for example, that Burger King wins by a huge margin compared to McDonalds, except, they spelled it MacDonalds, and so unless you know, you'll be getting the wrong impression.
[...] You can influence googlefight's *queries*, no doubt, but not its results, as you yourself say:
you can influence exactly what google searches for. --------------------------------^^^^^^^^^^^^
Just like the results SPSS delivers depends, of course, on the data file, which you can manipulate as you like it. Does that make SPSS an unreliable tool? Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
You can influence googlefight's *queries*, no doubt, but not its results, as you yourself say:
you can influence exactly what google searches for. --------------------------------^^^^^^^^^^^^
Just like the results SPSS delivers depends, of course, on the data file, which you can manipulate as you like it. Does that make SPSS an unreliable tool?
uhm... may i nonchalantly point out here the vast difference between changing the content of the data file and changing the content of the query run on the data file? not to mention the difference between a closed data file (obtained - ideally - by methodologically sound sampling), which cannot be actually manipulated and remain valid (except, of course, in how you arrange the variables, which data you include or exclude... which is a whole other level of argument) and an "open data file", (i.e., the field itself). heh - kinda neat to consider the internet under that title: an open data file. of course, in the case of googlefight, the data file is one step further away from us, mediated as it is with a predetermined (and single) form of query (number of links) on the content of the data file. on the other hand, we *do* have access to the data file. though, not directly through the googlefight, but rather indirectly by our influence on the internet content. but to be significant, it has to be a form of collective influence (i.e., cultural, national, etc.) thus - as with every statistical analysis - the form of the query should be carefully matched to the research question as well as to the form of the data and of the file. in this case, the data sample is actually the population/field, and google's particular form of the 'data file' (or the determination of which variables are accessible for measurement and how) is closed and not within our reach, and also determines the possible "statistic analyses" - which are very primitive (though certainly always statistically significant). an interesting situation, to be approached and used with caution. Heidi Dawn haLevi MA Research Psych. Bar-Ilan University, Israel .................................... Information and Experience Design ................................... New Media Education -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.6.2 - Release Date: 04/03/2005
Heidi, Scrive Heidi haLevi <heidi@processing.co.il>:
Just like the results SPSS delivers depends, of course, on the data file, which you can manipulate as you like it. Does that make SPSS an unreliable tool?
uhm... may i nonchalantly point out here the vast difference between changing the content of the data file and changing the content of the query run on the data file?
That's what I meant, I should have been more clear about it: You can have 8 potential indicators for class location of a person, and depending on the statistical method you are using, and the selction and/or combination of the indicators, your results will vary. If you choose bad indicators (such as "sex (m/f)" for class location in SPSS or "MacDonalds" for McDonald's' popularity in googlefight, your results might become misleading.
heh - kinda neat to consider the internet under that title: an open data file.
Err. How else would you conceptualize it?
thus - as with every statistical analysis - the form of the query should be carefully matched to the research question as well as to the form of the data and of the file. in this case, the data sample is actually the population/field, and google's particular form of the 'data file' (or the determination of which variables are accessible for measurement and how) is closed and not within our reach, and also determines the possible "statistic analyses" - which are very primitive (though certainly always statistically significant).
an interesting situation, to be approached and used with caution.
I hate to write that: I agree. Well, except, why should all statistical analyses on the Google "corpus" always be primitive? BTW: Besides google fight there is also googleshare (http://www.stevenberlinjohnson.com/movabletype/archives/000011.html): Unfortunately, there only seems to be a Flash implementation for Internet Explorer right now: http://www.rundogrun.com/Samples/MindShare.HTML Thomas -- thomas koenig, ph.d. department of social sciences, loughborough university, u.k. http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
Thomas Koenig wrote:
Actually, you do have influence on google fight results.
If so, please influence its results in a way that Kerry beats Bush.
Your wish is my command: http://www.googlefight.com/index.php?lang=en_GB&word1=John+Kerry&word2=%22Ge... -- Laura J. Little, Ed.D. Instructional Technologist/Director, Title III Grant Marietta College Marietta, OH littlea@marietta.edu (740)376-4815 http://www.marietta.edu/~littlea/ "Education is not the filling of a pail, but the lighting of a fire." William Butler Yeats
Hey now, it seems you didn't utilize the phrase operator "__ __" in those Googlefights, thereby biasing the results toward anything w/ "internet" in the search terms...Let's be fair: qua phrase, "internet research" loses miserably (sorry!) to sociology, psychology, "computer science", and linguistics...also "internet research" loses by a hair to "communucation studies." Just sayin'... LS On Fri, 4 Mar 2005 13:46:53 -0500 jeremy hunsinger <jhuns@vt.edu> wrote:
perhaps to sort of add to the google hits discussion, I bring you googlefight, and I've prearranged a little contest between internet research and internet studies
http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=internet+studies
and one between internet research and sociology: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=sociology
and one between internet research and communication research: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=communication+research
we beat them all hands down.... granted of course.... mostly you can orchestrate just about any result....
jeremy hunsinger jhuns@vt.edu www.cddc.vt.edu jeremy.tmttlt.com www.tmttlt.com
() ascii ribbon campaign - against html mail /\ - against microsoft attachments
_______________________________________________ The Air-l-aoir.org@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://aoir.org/airjoin.html
---- Lauren Squires Linguistics Program University of Virginia *** http://polyglotconspiracy.net
it doesn't need quotes unless you want to delimit internet research to just that specific phrasing :) as I note at the end, you can contrive any result you wish, if you know what you are doing. On Mar 4, 2005, at 3:34 PM, Lauren Squires wrote:
Hey now, it seems you didn't utilize the phrase operator "__ __" in those Googlefights, thereby biasing the results toward anything w/ "internet" in the search terms...Let's be fair: qua phrase, "internet research" loses miserably (sorry!) to sociology, psychology, "computer science", and linguistics...also "internet research" loses by a hair to "communucation studies." Just sayin'...
LS
On Fri, 4 Mar 2005 13:46:53 -0500 jeremy hunsinger <jhuns@vt.edu> wrote:
perhaps to sort of add to the google hits discussion, I bring you googlefight, and I've prearranged a little contest between internet research and internet studies
http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=internet+studies
and one between internet research and sociology: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=sociology
and one between internet research and communication research: http://www.googlefight.com/index.php? lang=en_GB&word1=internet+research&word2=communication+research
we beat them all hands down.... granted of course.... mostly you can orchestrate just about any result....
jeremy hunsinger jhuns@vt.edu www.cddc.vt.edu jeremy.tmttlt.com www.tmttlt.com
() ascii ribbon campaign - against html mail /\ - against microsoft attachments
_______________________________________________ The Air-l-aoir.org@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://aoir.org/airjoin.html
---- Lauren Squires Linguistics Program University of Virginia *** http://polyglotconspiracy.net _______________________________________________ The Air-l-aoir.org@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://aoir.org/airjoin.html
Jeremy Hunsinger Center for Digital Discourse and Culture () ascii ribbon campaign - against html mail /\ - against microsoft attachments
participants (10)
-
Barry Wellman -
Dr. Laura J. Little -
elijah wright -
Ellis Godard -
Heidi haLevi -
jeremy hunsinger -
Lauren Squires -
Louise Ferguson -
Thomas Koenig -
Ulla Bunz